Calculating Sample Size Requirements for Temporal Dynamics in Single-Cell Proteomics
2021; Elsevier BV; Volume: 20; Linguagem: Inglês
10.1016/j.mcpro.2021.100085
ISSN1535-9484
AutoresHannah Boekweg, Amanda J. Guise, Edward D. Plowey, Ryan Kelly, Samuel Payne,
Tópico(s)Cell Image Analysis Techniques
Resumo•Detecting temporal change in proteins depends on fold change and variability.•Replicate time courses improve reliability of detecting temporal dynamics.•Temporal experiments require a dense sampling of cells to track gradual transitions.•Time-course trajectory experiments require more samples than two-state comparisons. Single-cell measurements are uniquely capable of characterizing cell-to-cell heterogeneity and have been used to explore the large diversity of cell types and physiological functions present in tissues and other complex cell assemblies. An intriguing application of single-cell proteomics is the characterization of proteome dynamics during biological transitions, like cellular differentiation or disease progression. Time-course experiments, which regularly take measurements during state transitions, rely on the ability to detect dynamic trajectories in a data series. However, in a single-cell proteomics experiment, cell-to-cell heterogeneity complicates the confident identification of proteome dynamics as measurement variability may be higher than expected. Therefore, a critical question for these experiments is how many data points need to be acquired during the time course to enable robust statistical analysis. We present here an analysis of the most important variables that affect statistical confidence in the detection of proteome dynamics: fold change, measurement variability, and the number of cells measured during the time course. Importantly, we show that datasets with less than 16 measurements across the time domain suffer from low accuracy and also have a high false-positive rate. We also demonstrate how to balance competing demands in experimental design to achieve a desired result. Single-cell measurements are uniquely capable of characterizing cell-to-cell heterogeneity and have been used to explore the large diversity of cell types and physiological functions present in tissues and other complex cell assemblies. An intriguing application of single-cell proteomics is the characterization of proteome dynamics during biological transitions, like cellular differentiation or disease progression. Time-course experiments, which regularly take measurements during state transitions, rely on the ability to detect dynamic trajectories in a data series. However, in a single-cell proteomics experiment, cell-to-cell heterogeneity complicates the confident identification of proteome dynamics as measurement variability may be higher than expected. Therefore, a critical question for these experiments is how many data points need to be acquired during the time course to enable robust statistical analysis. We present here an analysis of the most important variables that affect statistical confidence in the detection of proteome dynamics: fold change, measurement variability, and the number of cells measured during the time course. Importantly, we show that datasets with less than 16 measurements across the time domain suffer from low accuracy and also have a high false-positive rate. We also demonstrate how to balance competing demands in experimental design to achieve a desired result. Individual cells express a unique proteome; this is true for cells in a complex environment as well as cells in a laboratory-controlled cell culture experiment. These differences arise from both intrinsic and extrinsic factors, such as access to nutrients, spatial relationships to other cells, or cell cycle status (1Mahdessian D. Cesnik A.J. Gnann C. Danielsson F. Stenström L. Arif M. Zhang C. Le T. Johansson F. Shutten R. Bäckström A. Axelsson U. Thul P. Cho N.H. Carja O. et al.Spatiotemporal dissection of the cell cycle with single-cell proteogenomics.Nature. 2021; 590: 649-654Crossref PubMed Scopus (9) Google Scholar). Tissues in multicellular organisms often contain a variety of discrete cell types, each of which expresses a unique proteome, and the combination of these functional cell states gives rise to the overall tissue function. Single-cell measurements facilitate the study of such differences by quantitatively measuring transcript or protein abundances for individual cells (2Specht H. Slavov N. Transformative opportunities for single-cell proteomics.J. Proteome Res. 2018; 17: 2565-2571Crossref PubMed Scopus (56) Google Scholar, 3Doerr A. Single-cell proteomics.Nat. Methods. 2019; 16: 20Crossref PubMed Scopus (22) Google Scholar). Much of the early work in single-cell phenotypic characterization was done by mRNA sequencing (4Macosko E.Z. Basu A. Satija R. Nemesh J. Shekhar K. Goldman M. Tirosh I. Bialas A.R. Kamitaki N. Martersteck E.M. Trombetta J.J. Weitz D.A. Sanes J.R. Shalek A.K. Regev A. et al.Highly parallel genome-wide expression profiling of individual cells using nanoliter droplets.Cell. 2015; 161: 1202-1214Abstract Full Text Full Text PDF PubMed Scopus (2734) Google Scholar, 5Klein A.M. Mazutis L. Akartuna I. Tallapragada N. Veres A. Li V. Peshkin L. Weitz D.A. Kirschner M.W. Droplet barcoding for single-cell transcriptomics applied to embryonic stem cells.Cell. 2015; 161: 1187-1201Abstract Full Text Full Text PDF PubMed Scopus (1423) Google Scholar), and this still remains a widely used data source. However, intracellular and extracellular functions are most often carried out by proteins, e.g., cytoskeletal structures, metabolic enzymes, signal transducers, and others. Previous work over the past decade has revealed that mRNA abundance measurements are a poor proxy for protein abundance measurements (for review, see (6Fortelny N. Overall C.M. Pavlidis P. Freue G.V.C. Can we predict protein from MRNA levels?.Nature. 2017; 547: E19-E20Crossref PubMed Scopus (67) Google Scholar, 7Liu Y. Beyer A. Aebersold R. On the dependency of cellular protein levels on MRNA abundance.Cell. 2016; 165: 535-550Abstract Full Text Full Text PDF PubMed Google Scholar, 8Payne S.H. The utility of protein and MRNA correlation.Trends Biochem. Sci. 2015; 40: 1-3Abstract Full Text Full Text PDF PubMed Scopus (81) Google Scholar)). Indeed, many important temporal trends in the dynamic proteome are not detected in mRNA data (9Waldbauer J.R. Rodrigue S. Coleman M.L. Chisholm S.W. Transcriptome and proteome dynamics of a light-dark synchronized bacterial cell cycle.PLoS One. 2012; 7e43432Crossref PubMed Scopus (95) Google Scholar). This is true both for bulk measurements and also for single cells (10Taniguchi Y. Choi P.J. Li G.-W. Chen H. Babu M. Hearn J. Emili A. Xie X.S. Quantifying E. Coli proteome and transcriptome with single-molecule sensitivity in single cells.Science. 2010; 329: 533-538Crossref PubMed Scopus (1313) Google Scholar, 11Ståhlberg A. Thomsen C. Ruff D. Åman P. Quantitative PCR analysis of DNA, RNAs, and proteins in the same single cell.Clin. Chem. 2012; 58: 1682-1691Crossref PubMed Scopus (48) Google Scholar, 12Darmanis S. Gallant C.J. Marinescu V.D. Niklasson M. Segerman A. Flamourakis G. Fredriksson S. Assarsson E. Lundberg M. Nelander S. Westermark B. Landegren U. Simultaneous multiplexed measurement of RNA and proteins in single cells.Cell Rep. 2016; 14: 380-389Abstract Full Text Full Text PDF PubMed Scopus (112) Google Scholar). A recent investigation of proteome dynamics during the cell cycle found that only 15% of mitotic cycling proteins had coordinated cycling mRNA transcripts (1Mahdessian D. Cesnik A.J. Gnann C. Danielsson F. Stenström L. Arif M. Zhang C. Le T. Johansson F. Shutten R. Bäckström A. Axelsson U. Thul P. Cho N.H. Carja O. et al.Spatiotemporal dissection of the cell cycle with single-cell proteogenomics.Nature. 2021; 590: 649-654Crossref PubMed Scopus (9) Google Scholar). Thus, to identify dynamic proteome responses at the single-cell level, proteomic measurements are necessary. Single-cell proteomics poses an enormous technical challenge and, until recently, global proteome profiling had not been demonstrated (for review, see (13Kelly R.T. Single-cell proteomics: Progress and prospects.Mol. Cell Proteomics. 2020; 19: 1739-1748Abstract Full Text Full Text PDF PubMed Scopus (28) Google Scholar)). As single-cell proteomics gains momentum, it is important to note the practical limitations that are encountered for its practitioners, particularly as they relate to the number of cells that can be analyzed within an experiment. Single-cell mRNA-sequencing experiments, which benefit from ligation-based barcoding (14Smith A.M. Heisler L.E. St Onge R.P. Farias-Hesson E. Wallace I.M. Bodeau J. Harris A.N. Perry K.M. Giaever G. Pourmand N. Nislow C. Highly-multiplexed barcode sequencing: An efficient method for parallel analysis of pooled samples.Nucleic Acids Res. 2010; 38: e142Crossref PubMed Scopus (218) Google Scholar), are able to multiplex tens of thousands of samples into a single data acquisition run (15Lan F. Demaree B. Ahmed N. Abate A.R. Single-cell genome sequencing at ultra-high-throughput with microfluidic droplet barcoding.Nat. Biotechnol. 2017; 35: 640-646Crossref PubMed Scopus (167) Google Scholar). However, proteomics multiplexing remains limited to ~20 samples (16Li J. Van Vranken J.G. Pontano Vaites L. Schweppe D.K. Huttlin E.L. Etienne C. Nandhikonda P. Viner R. Robitaille A.M. Thompson A.H. Kuhn K. Pike I. Bomgarden R.D. Rogers J.C. Gygi S.P. et al.TMTpro reagents: A set of isobaric labeling mass tags enables simultaneous proteome-wide measurements across 16 samples.Nat. Methods. 2020; 17: 399-404Crossref PubMed Scopus (65) Google Scholar). Therefore, the primary limitation in the proteomic profiling of a very large number of individual cells (>1000) is still instrument acquisition time. For this reason, many researchers face practical limitations in the type of experiment that can be designed to investigate proteomic phenotypes at the single-cell level. Characterizing proteome dynamics over time is critical for understanding cellular differentiation, disease progression, and treatment response. In contrast to a two-state comparison, a time-course experiment collects measurements several times during a biological process. One fundamental question for this experimental design is how many time points need to be sampled in order to detect protein dynamics. Because of the practical limitations of single-cell proteomics, it is possible that the number of time points analyzed may not be sufficient to achieve statistical confidence in the dynamic trends. Whereas the t test commonly used in a two-state comparison is well behaved with a small number of samples in each state (e.g., 5–10), it is unknown how such a small sampling will impact the ability to detect trends in time-course data. A variety of tools have been created to help map expression dynamics or trajectories (17Alpert A. Moore L.S. Dubovik T. Shen-Orr S.S. Alignment of single-cell trajectories to compare cellular expression dynamics.Nat. Methods. 2018; 15: 267-270Crossref PubMed Scopus (31) Google Scholar, 18Trapnell C. Cacchiarelli D. Grimsby J. Pokharel P. Li S. Morse M. Lennon N.J. Livak K.J. Mikkelsen T.S. Rinn J.L. The dynamics and regulators of cell fate decisions are revealed by pseudotemporal ordering of single cells.Nat. Biotechnol. 2014; 32: 381-386Crossref PubMed Scopus (1587) Google Scholar, 19Bendall S.C. Davis K.L. Amir E.-A.D. Tadmor M.D. Simonds E.F. Chen T.J. Shenfeld D.K. Nolan G.P. Pe'er D. Single-cell trajectory detection uncovers progression and regulatory coordination in human B cell development.Cell. 2014; 157: 714-725Abstract Full Text Full Text PDF PubMed Scopus (495) Google Scholar). However, these are often created within the assumptions of an experiment where the number of sampled cells across the time domain is potentially thousands. Herein, we present a method to facilitate estimating the number of cells needed in a temporal dynamics experiment by systematically exploring the impact of proteome variability and effect size through a large simulation similar to a power analysis. All calculations and data used in this article can be found in our publicly available GitHub repository, https://github.com/PayneLab/SingleCellSampleSize. Below are listed specific scripts used to create figures and metrics cited in the article. For the data presented in Figures 1 and 2, we simulated a large population of "cells" with a single protein abundance and time measurement using the standard formula y = mx + b as protein_abundance = slope ∗ time + 1 + ε. The error term, ε, is a random error that mimics the biological and/or technical variability in a measurement. This error is drawn from a zero-centered normal distribution with a specified standard deviation. For a single simulation, slope and standard deviation are chosen from slope s ∈ [0.5, 1, 2, 4] and standard deviation v ∈ [0, 0.25, 0.5, 0.75, 1]. We then created a population of 10,000 with random seeds for the time variable between 0 and 1. The full software for making simulated populations can be found in the GitHub repository in a file called simulate_data.R. To determine the true-positive (TP) and false-positive (FP) rates, we sampled a specific number of cells from the population, n_sample ∈ [7, 16, 20, 30, 100]. With the protein abundance and time values for these subsampled cells, we used cellAlign (17Alpert A. Moore L.S. Dubovik T. Shen-Orr S.S. Alignment of single-cell trajectories to compare cellular expression dynamics.Nat. Methods. 2018; 15: 267-270Crossref PubMed Scopus (31) Google Scholar) to interpolate a trajectory (supplemental Fig. S1). Next, we calculated the area between the interpolated trajectory and the population's true temporal trajectory, ABC_true. The null hypothesis (no change) was tested by calculating the area between the interpolated trajectory and a horizontal line at the mean protein abundance of the subsampled cells; this metric is called ABC_null. If the ABC_true < ABC_null, then we assert that the data from the subsampling represent a changing protein; if ABC_true > ABC_null, then we assert that the subsampling represents a nonchanging protein. The accuracy reported in Figure 1 is the result of 1000 subsamples per population. We also repeated the entire simulation 10 times, which is shown in Figure 1 as the error bars. The full software for calculating accuracy and making the panels in Figure 1 can be found in our GitHub repository in the file called makeFigure1.R. The process for calculating the false discovery was very similar to calculating accuracy, except that the correct answer was that there was no change, and the incorrect answer was that interpolated trajectories matched a sloped line. We simulated a large population of cells, using the same formula as previously used one, with the caveat that this population always had a true slope of zero. To calculate false discovery, we again subsampled cells from the population and interpolated a trajectory using cellAlign. We then calculated an area between the curve metric ABC_true, which reflected the distance between the interpolated trajectory and the population's true trajectory (i.e., no change). We also calculated an ABC_falsepositive metric, which reflected the area between the interpolated trajectory and a sloped line. For the plots in Figure 2, the slope was either 0.5, 1, 2, or 4. The FP rate reported in Figure 2 is the result of 1000 subsamples per population. We repeated the entire simulation 10 times, which is shown in Figure 2 as the error bars. The full software for simulation, calculations, and plotting the panels in Figure 2 can be found in our GitHub repository in the file makeFigure2.R. To generalize the TP and FP rates, we ran a separate simulation to recalculate these metrics using the slope/variation (S/V) ratio and number of subsampled cells. The data in Figure 3 were generated using the same types of simulations as described previously for Figures 1 and 2 except that different slopes and variations were used to calculate S/V values of 0.5, 1, 1.5, 2, 3, 4, and 6. The plotted data come from different simulations of a single S/V ratio with different values for S and V, respectively. For example, data for S/V of 0.5 were generated using S/V = (0.5/1.0; 0.75/1.5; 1.0/2.0; 1.5/3.0; 2.0/4.0; and 3.0/6.0). As illustrated in supplemental Figure S3, different S/V combinations for the same ratio have equivalent accuracy. This demonstrates that the S/V metric is robust to scale and will still be useful for methods that influence the ratio, such as tandem mass tag experiments, which result in signal compression. The other S/V values were calculated in the same manner. The full code used to simulate, calculate, and plot data in Figure 3 can be found in our GitHub repository in the files makeFigure3A.R and makeFigure3B.R. Supplemental Figure S2 reanalyzes published single-cell proteomics data (20Dou M. Clair G. Tsai C.-F. Xu K. Chrisler W.B. Sontag R.L. Zhao R. Moore R.J. Liu T. Pasa-Tolic L. Smith R.D. Shi T. Adkins J.N. Qian W.-J. Kelly R.T. et al.High-throughput single cell proteomics enabled by multiplex isobaric labeling in a nanodroplet sample preparation platform.Anal. Chem. 2019; 91: 13119-13127Crossref PubMed Scopus (56) Google Scholar) and uses two cell types, the murine epithelial cell line C10 and the murine endothelial cell line SVEC, to plot fold change between cell type and the within-group variation. The fold change for each protein is found by taking the absolute difference between the mean abundance for C10 cells and SVEC cells. The plotted variation is the standard deviation of C10 cells. The full code used to analyze, calculate, and plot data for Supplemental Figure S2 can be found in our GitHub repository in the file makeFigure2.R. To characterize how well a subsample of measurements can approximate the true slope and variation of a larger population, we ran a simulation as described previously with true S/V = 1. From the large population of 10,000 cells, we subsampled a given number of cells n_sample ∈ [7, 16, 20, 30, 100] and calculated S/Vest. The S/Vest is calculated by using linear regression to fit a line to the subsampled data. The slope of the fitted line is then used as our estimated slope. The estimated variation is found by calculating the standard deviation of the residuals off the fitted line. We calculated S/Vest for 1000 independent subsamples of the population and computed the distribution of the difference S/Vtrue − S/Vest. This is plotted in Figure 4A. Figure 4B shows the number of proteins that would remain after using S/Vest = 1 as a cutoff. We simulated five populations with 1000 cells, at S/Vtrue ∈ [0, 0.5, 1, 1.5, 2]. We subsampled 30 cells from a population, calculated S/Vest, and discarded the subsampling event if S/Vest < 1. The figure shows the percent of sampling events that remained after filtering. The full code used to analyze, calculate, and plot data for Figure 4, A and B can be found in our GitHub repository in the file makeFigure4.R and simulate_data_figure4.R. One of the challenges associated with mass spectrometry measurements is that they are inherently destructive. In order to measure the proteins in a cell, the cell itself is destroyed (e.g., lysed). Thus, for an experiment that measures change over time, the cells measured at time0 will be different from the cells measured at time1. This means that the change in protein abundance observed between time0 and time1 has at least two potential sources. Some of the change can be attributed to temporal dynamics. The second source of change between cells measured at time0 and time1 is attributed to cell-to-cell variability. We specifically emphasize that even for synchronized cells, or experimental designs where great effort has been taken to homogenize cellular state, there will be a real and observable variability in the proteome between individual single cells. Furthermore, this variability is often much greater than one might initially expect. A second important element of a time-course experiment is how time is measured. There are numerous relevant applications of time in biology, including cellular differentiation, the transition from health to disease, or response to an external stimulus. In these various experimental systems, time can be absolute (2 PM) or relative (5 min after stimulus). Time can be an observable fact (date) or a value inferred from the data (relative time during the cell cycle as measured by the abundance of various markers). In studying disease progression, the time metric is more accurately a "pseudotime" that measures the approximate progression from a healthy to a diseased state, perhaps measured by visible morphological features. Depending on the specific scientific question, an experimental sampling of the time domain might be rigorously defined at very specific intervals or a random sampling. In this article and the statistical simulations described later, time is abstract. The variable representing time varies within the bounds from zero to one; zero represents the temporal beginning of an experiment and one represents the end. While many experiments are able to collect samples at multiple, discrete, and specific time points, there are also many experiments where this is not possible. For example, consider cells sampled from the bone marrow with an experimental goal of understanding blood cell development. These collected cells exist in a wide variety of states, including hematopoietic stem cells, lymphoblasts, T cells, and all stages in between (along with all other developmental end points). An important goal of single-cell characterization is to understand the transitions between these states as cells mature. A similar experiment could be designed around disease progression, with a goal of understanding the transition between a healthy cell and a dysfunctional cell. The experiments described previously view time as a continuum and are interested in the gradual transition of cells over time. This experimental design is fundamentally different from one that has control over time, such as a pharmacological dose/response test. If one has control over time, a t test is appropriate. The focus of this article addresses experiments that do not have the option of collecting samples at discrete and prespecified time points. In this situation, grouping samples into early and late time points creates a structure appropriate for a t test. However, this is inappropriate when one wants to understand the gradual transition between continuous time points. For these experiments, we explore alternative statistical metrics. To understand how measurement variability, effect size, and the sample size affect our ability to detect temporal changes in proteome dynamics, we performed a large simulation across the relevant parameter space. Each simulated cell has one protein measurement and an associated time variable, calculated from two input parameters. First, the simulation uses a simple linear change as the shape of temporal dynamics, and in our model, this rate of change is called the slope. Second, we specify a measurement variability modeled as a normally distributed error term. Using slope, variability, and a time value between 0 and 1, we can calculate protein abundance (see Experimental procedures section). The simulation starts by creating a large number of cells that cover the entire time range; each cell is represented as an (abundance, time) data point. From the simulated population, we subsampled a small number of cells with the primary goal of determining whether this small sample accurately represents the larger population. The subsampled measurements are used to interpolate a temporal expression trajectory using cellAlign (17Alpert A. Moore L.S. Dubovik T. Shen-Orr S.S. Alignment of single-cell trajectories to compare cellular expression dynamics.Nat. Methods. 2018; 15: 267-270Crossref PubMed Scopus (31) Google Scholar); this interpolated trajectory was then compared with the true population trajectory and a null model (supplemental Fig. S1). If the subsampled trajectory was closer to the true population trajectory, then we classified this subsampling event as correct; if the subsampled trajectory was a better fit to the null model, then this subsampling event was classified as incorrect. We ran multiple simulations with different parameter values for slope and variability. In each simulation, we calculated the average accuracy (see Experimental Procedures section). Figure 1 shows the accuracy of detecting protein dynamics for various combinations of slope, measurement variation, and the number of sampled cells. Several expected trends emerge from the simulation. First, the accuracy improves if more cells are sampled across the time domain. Regardless of the slope or measurement variability, increasing the number of cells improves the temporal expression trajectory interpolated from the subsampling. The smallest number that we sampled, seven cells, has poor accuracy in almost any (slope, variation) parameter set. A significant improvement is seen in increasing from 7 to 16 cells, followed by steady improvement as the number of cells increases to 20, 30, or beyond. Second, accuracy decreases with greater measurement variability. For example, in the simulations where slope = 1, a 16 cell sample has an average accuracy of 95% if the measurement variability is 0.25 but has an average accuracy of 68% if the variability = 1.0. To characterize the potential for incorrectly labeling a protein as changing, when it actually was unchanged in the time course, we simulated data where the slope parameter was zero. From this large population, we again subsampled, interpolated an expression trajectory using cellAlign, and compared the trajectory to the population's true trajectory (i.e., no change) or a simple sloped line. If the interpolated trajectory more closely matched a sloped line, despite being composed of data points in a population whose slope was zero, we counted this as an FP. As we discovered in simulations previously, the number of sampled cells and the measurement variability have a significant impact on the FP rate (Fig. 2). In several simulations with a small subsampling (e.g., seven cells), the FP rate approached 40%. To help generalize the results of our simulation, and make them more immediately applicable to proteomics datasets, we grouped parameter sets by their S/V ratio. The simulations in Figures 1 and 2 report results that used a convenient numeric scale. However, proteomic datasets reported in the literature have a wide range of possible values, with some datasets reporting raw quantitative values in the millions and others using log-transformed and zero-centered data. By transforming our results into an S/V ratio, we directly test whether the observed TP and FP trends are scale free. Thus, regardless of how quantitative protein data are obtained or processed, the S/V ratio can provide applicable guidance. In addition, it does not matter whether the source of variability is technical or biological; we account for both in this combined metric. For a wide range of S/V ratios, we ran simulations to generate the TP and FP rates (see Experimental procedures section). This explicit exploration of the relationship between expression change and variability revealed distinct trends in the ability to correctly identify protein dynamics. For example, in an experiment with 16 cells, an S/V = 2 would have an 80% TP rate (Fig. 3A), meaning that if 100 proteins had this trajectory slope and measurement variability, we expect to detect 80 of them (the other 20 go undetected). Under the same conditions, we also see a 15% FP rate. This means that of all the nonchanging proteins, 15% of them would be incorrectly identified as changing. As with the results presented in Figures 1 and 2, the TP and FP rates improve when a larger number of cells are sampled across the time course. If an experiment sampled 30 cells instead of 16 cells, the TP rate would improve from 80% to 90% for S/V =2; coordinately, the FP rate would fall from 15% to 5%. These charts are essential in understanding how the accuracy of temporal trajectory detection depends both on the number of cells analyzed across the time domain and also the trajectory's slope and the protein's inherent measurement variability. To help frame these results, we sought to understand S/V values for proteins in real data. We examined a single-cell proteomics experiment that compared two different cell types (20Dou M. Clair G. Tsai C.-F. Xu K. Chrisler W.B. Sontag R.L. Zhao R. Moore R.J. Liu T. Pasa-Tolic L. Smith R.D. Shi T. Adkins J.N. Qian W.-J. Kelly R.T. et al.High-throughput single cell proteomics enabled by multiplex isobaric labeling in a nanodroplet sample preparation platform.Anal. Chem. 2019; 91: 13119-13127Crossref PubMed Scopus (56) Google Scholar), with a sufficient number of replicates to obtain a reliable estimate of within-group variability (n > 20). We note that this dataset does not demonstrate temporal dynamics but rather the magnitude of differences in protein abundance between biological states. This can still be used to approximate slope if time is scaled to a unit value. Fold change and within-group variability were calculated for each protein in this dataset (supplemental Fig. S2). Most proteins have a small fold change, and often the magnitude of fold change is similar to the magnitude of variability. Thus, relatively few proteins have an S/V ratio above one. As shown in Figure 3, proteins with an S/V below one will have a low TP and high FP rate unless a very large number of cells are used. Proteins with a more attractive TP and FP rate, such as S/V = 2, are only ~3% of proteins in the dataset. Proteins with S/V > 4 are exceedingly rare (<0.2%). The analysis of real data does not benefit from knowing the true protein expression dynamics, as one knows in a simulated dataset. Thus, when trying to apply the lessons learned in a simulation (e.g., Fig. 3) to a real-world time-co
Referência(s)