TurboID/miniTurbo proximity labeling can generate long protein tables fast. The hard part is turning those tables into a defensible set of enriched proximal proteins—and then turning that set into biology you can explain in a figure legend.
This guide is built for consideration-stage readers who already know the basics of LC–MS/MS and are comparing workflows and tools. It focuses on the downstream decisions that control false positives: identification and quantification choices, control design, filtering thresholds, and how to do enrichment and network analysis without bias.
Key Takeaway: In TurboID, "a protein identified" is not the same as "a proximal protein you should interpret." Your TurboID mass spectrometry data analysis lives or dies by controls, reproducibility, and a transparent filtering rule.
TurboID and miniTurbo label proteins near your bait in living cells, and you typically enrich biotinylated proteins with streptavidin before LC–MS/MS. That means your primary output is an enriched proteomics dataset—strongly shaped by what binds streptavidin beads, what's endogenously biotinylated, and what your control samples look like.
Two consequences follow:
If you want a method overview of how proximity labeling fits relative to related approaches, see proximity labeling techniques.
Input: a folder of raw MS files + your experimental design notes.
Action: Create a single sample sheet capturing, at minimum:
Output: a sample metadata table you can hand to an analyst and later include in supplementary methods.
Done when: every raw file maps to exactly one row, and you can write your bait-vs-control contrasts unambiguously.
Input: your question ("who is proximal to my bait in compartment X under condition Y?").
Action: Decide your negative controls before touching a volcano plot. Most analysis pain comes from controls that don't match the bait's labeling environment.
A practical control matrix:
| Control class | What it controls for | When it's essential | What it won't fix |
| No-biotin (omit exogenous biotin) | baseline labeling / endogenous biotin effects | when background is high or biotin handling varies | compartment mismatch |
| TurboID-only / miniTurbo-only (no bait) | ligase-driven nonspecific labeling | overexpression systems; comparing TurboID vs miniTurbo | bait expression artifacts |
| Localization-matched control (e.g., YFP-TurboID targeted to same compartment) | compartment "neighborhood" baseline | almost always; especially for nucleus/ER/mitochondria | nonspecific bead binders |
| Catalytically dead ligase fusion | biotin-independent artifacts | when toxicity/mislocalization suspected | endogenous biotinylated proteins |
A plant-focused analysis guide explicitly recommends a negative control TurboID localized to the same compartment as the bait (e.g., NLS-tagged control for nuclear baits), and filtering interactomes by exclusivity or high fold enrichment over that control across multiple experiments (see A brief guide to analyzing TurboID-based proximity labeling–mass spectrometry datasets (2025)).
Output: a control decision you can explain in one sentence.
Done when: you can point to at least one control that shares the bait's compartment and enrichment workflow.
⚠️ Warning: If your control does not localize similarly to your bait, downstream "significance" often reflects localization differences—not biology.
Input: raw MS files.
Action: Identify peptides/proteins in your search engine (e.g., MaxQuant or Proteome Discoverer) with standard proteomics hygiene:
Output: a protein-level table with accession IDs, gene symbols, peptide counts, and quant values per sample.
Done when: you can answer "Which protein IDs are supported by sufficient peptide evidence?" without manual patching.
Input: identification results.
Action: Confirm your quant strategy and its implications:
Output: a matrix of quantitative values per protein per sample (e.g., LFQ intensities or reporter-ion intensities).
Done when: replicate distributions look comparable after normalization, and obvious loading artifacts are corrected.
Input: quantitative matrix.
Action: Run a small set of QC checks before hypothesis testing:
Output: a short QC note (even a screenshot folder) that explains why the dataset is analyzable.
Done when: you're confident the biggest differences aren't "day-of-prep" or "column A vs column B."
Input: QC-passed protein table.
Action: Apply a staged filtering strategy so you can tune stringency based on your biology.
A practical ladder:
1. Technical confidence filter
2. Presence/reproducibility filter
3. Enrichment filter vs matched control
4. Statistics / FDR filter
5. Biology-informed filters (use carefully)
Output: at least two lists:
Done when: you can explain, in one paragraph, why a protein is in or out.
If you have a strong prior about what should and should not appear (common in organelle/compartment mapping), ROC analysis is a principled way to pick cutoffs.
The original TurboID paper describes defining true-positive and false-positive protein sets for a compartment and selecting cutoffs that maximize separation, then intersecting replicates to yield final proteomes (Efficient proximity labeling in living cells and organisms with TurboID (2018)).
In practice, you can operationalize this without heavy statistics jargon:
Output: a cutoff justified by marker behavior, not vibes.
Done when: marker enrichment tracks your chosen threshold.
At this point, you have "enriched proteins." The next question is: which proteins do you validate first?
Use a transparent scorecard that mixes quantitative confidence with biological plausibility. This is the backbone of TurboID candidate prioritization and it forces you to declare your assumptions.
| Dimension | What to compute | Why it matters |
| Enrichment strength | log2 fold-change bait/control | higher often means closer/more consistent labeling |
| Statistical support | adjusted p-value / q-value or probability score | controls false positives across many proteins |
| Reproducibility | detection across biological replicates | reduces "one-run wonders" |
| Localization concordance | predicted/known compartment match | proximity labeling is spatial; mismatch flags artifacts |
| Prior knowledge | known interactions / pathway membership | speeds validation and interpretation |
A simple approach is to rank each dimension (e.g., 1–5) and sum a weighted score. The weighting should reflect your risk tolerance: discovery projects can weight biology higher; publication-critical maps should weight reproducducibility and controls higher.
Output: a ranked list with explicit reasons for the top 10–50 candidates.
Done when: two people can independently apply your scorecard and get similar top candidates.
GO/pathway analysis is where many TurboID projects get over-interpreted.
Two rules keep it honest:
1. Use the right background universe. If your pull-down only "measured" a subset of the proteome, don't use the entire genome/proteome as the background. Use the set of proteins that were actually detected/quantified in your experiment as the universe for enrichment tests.
2. Separate localization validation from function discovery.
Tooling options: g:Profiler, Enrichr, DAVID, Reactome, and R packages like clusterProfiler can all work. What matters is (a) correct background, (b) multiple testing correction, and (c) not pretending GO terms are mechanistic proof.
Output:
Done when: enriched terms survive correction and remain stable across reasonable parameter choices.
Protein lists get interpretable when you can see modules: complexes, pathways, subcompartments.
A pragmatic workflow:
Interpretation guardrail: network edges reflect known evidence (literature, co-expression, etc.), not necessarily your experimental mechanism. Use networks to propose modules, not to declare direct interactions.
Output: a small number of annotated modules with short biological interpretations.
Done when: each module can be summarized in 1–2 sentences ("This cluster suggests X process near the bait under condition Y.").
AP-MS is optimized for stable complexes that survive lysis and purification. TurboID labels proteins in living cells and can capture transient or weak neighbors—but also captures bystanders in the same compartment.
Implications for downstream analysis:
If you're comparing method families (BioID/BioID2/TurboID/APEX and more), see BioID platform.
If you're packaging results for collaborators—or trying to future-proof a project for publication—aim to deliver:
No. TurboID reports proximity in living cells; it captures direct binders and nearby bystanders in the same compartment.
A localization-matched negative control processed identically; without it, enrichment statistics can reflect localization differences instead of bait-specific proximity.
Three biological replicates per condition is a practical minimum for stable enrichment calls; fewer replicates increases the risk that missing values drive your hit list.
Usually no. The core pipeline is evergreen; what changes is tooling and best-practice nuance, which you can handle via citations to recent reviews.
References
Online Inquiry