Update for the logicle data scale including operational code implementations
2012; Wiley; Volume: 81A; Issue: 4 Linguagem: Inglês
10.1002/cyto.a.22030
ISSN1552-4930
Autores Tópico(s)Cell Image Analysis Techniques
ResumoSince its introduction in the early 2000s, the logicle data scale (1, 2) has been widely adopted by the cytometry community. The characteristics of logicle displays and their use to facilitate interpretation of flow cytometry data have been discussed in several papers (3, 4). FlowJo software (Tree Star, Ashland, OR) offers the option of setting logicle transformation as the default for some data types. Diva software (BD Biosciences, San Jose, CA) offers logicle displays with scaling customized to the distribution of the data in each plot (called "Biexponential" but using the logicle constraints on the biexponential and logicle methods for selecting display parameters). In our experience with data from many different biological applications and a variety of instruments, logicle transformation provides good representation of all cytometry data for which direct linear presentation is not the most appropriate choice and has shown no tendency to generate artifactual features in displays. On this basis, we now recommend logicle as the appropriate default method for nonlinear transformation of cytometric data. In support of this recommendation, this communication provides (1) minor updates to the logicle specification, (2) a rigorously defined parameterization that should clarify questions that have come from people who were implementing logicle transformations in software, (3) technical details that were not considered suitable for the original publication but have become important for the standardization and consistent application of the method, (4) reference implementation code for both high-precision and routine calculations, and (5) a more detailed mathematical exposition on Biexponential Functions (Supporting Information). We expect that the formulas presented here will be the normative definition for the logicle transformation included in the ISAC Standards Committee's recommendation in Gating-ML (Analytical Cytometry Standard (ACS)—Gating-ML Component, http://flowcyt.sourceforge.net/gating/). Implementations of the logicle scale in the Java, C++, and C programming languages are provided in the Supporting Information and are released under the Revised Berkeley Software Distribution open source license for free use in cytometry software. A module for integrating the logicle scale into the R statistical programming environment is available in the Bioconductor repository (http://www.bioconductor.org/). Several algorithms and computational methods used in this work were adopted or modified from Ref.5 (NR3, www.cambridge.org/numericalrecipes and www.nr.com). The objective in developing the logicle transform was to avoid the problems encountered with logarithmic scaling of cytometric data, when primary measurements and/or compensated data includes very low and negative values (illustrated in Fig. 1B). The inverse hyperbolic sine function (sinh−1) offers attractive features as a scaling function in that it gives a near-linear region above and below data zero, near-logarithmic behavior at high data values and a narrow but very smooth transition from one to the other. Transformations of this sort have been used for various kinds of data (6, 7). However, we found that sinh−1 and similar formulations do not provide enough flexibility to maintain consistent near-log behavior for high data values while adjusting the width and slope of the near-linear region to optimally display different data sets. Therefore, we adopted a more generalized biexponential form and constrained that so that a single width parameter (W in the mathematical expression below) sets the negative end of the scale and the width and slope of the near-linear region. This single parameter is used to tune the display for a particular data set. The only really comparable transform we are aware of is the HyperLog (8). Truncating a power series expansion of the negative exponential in a logicle formula to just the first-order term generates a HyperLog function. The full exponential function in the logicle version makes the near-linear region closer to linear and the near-log region closer to logarithmic compared to a corresponding HyperLog. Also, at least as originally formulated, HyperLog does not provide a single adjustable parameter comparable to the logicle width for matching to specific data sets. Appropriate and inappropriate selection of display types and display parameters. The lower histograms show events in the boxed region of the corresponding upper bivariate plots. In each of the logicle histograms, the left end of the integration bar approximates the median of the distribution. (A) Uncompensated logicle display of mouse spleen cells stained for only CD19 using a PE-Cy5.5 conjugated reagent and showing spectral overlap onto the PE-Cy7 channel. (B) A 4.5-decade logarithmic display of the same data compensated. The medians of both the CD19+ and the CD19− populations are near zero and below the minimum of the logarithmic scale. The apparent peak in the gated population at about data value 90 is actually at the 81st percentile of the distribution. This effect is sometimes referred to as a "log artifact." (C) Logicle display of the same data with a width value that is inappropriately low for the data. The median of the CD19+ population is on scale near zero but appears as a low point in the histogram display. Most of the negative value data points pile up at the baseline. (D) Adding 1.5 decades negative display space (A = 1.5) with the same 4.5-decade base and 0.25-decade width as in (C) brings the whole distribution on scale but with the CD19+ population looking bimodal in the PE-Cy7 dimension. (E) A logicle display of the same data with an appropriate choice of parameters: A = 0 and a higher value of W than in (C). The CD19+ population is well defined with no indication of bimodality, and all but a few data points are displayed on scale. Data details: Cells from the spleen of an 8-week-old BALB/c mouse were stained with antibody to CD19 tagged with PE-Cy5.5 (clone 6D5, cat#RM7718, Invitrogen, Carlsbad, CA). The cells were analyzed on a 4-laser LSR II (BD Biosciences). The measurement channels used in the figure are both excited by a 150-mW 532-nm laser with a 710/50 emission bandpass filter on the PE-Cy5.5 channel and a 780/60 emission bandpass filter on the PE-Cy7 channel. Data analysis, including fluorescence compensation and displays, was carried out with FlowJo 9.3.1 (Tree Star). [Color figure can be viewed in the online issue, which is available at wileyonlinelibrary.com.] The original specification of the logicle method (1, 2) yields a display scale spanning some range of natural or common logarithms. However, for good software practice, it is desirable to scale onto the unit interval [0,1] and use a separate viewing transform to control the layout of the graph on the page or screen. That approach is taken here. The original formulation included a parameter T, which was approximately, but not exactly, the top of scale. For certain allowable values of the parameters, the difference can be significant. Here, we present formulas for which T is exactly the top of scale. Experience with the logicle methods has shown that in a few cases it is useful to include a parameter A specifying an additional number of decades in the negative data range. In this article, we define this the parameter A rigorously and show how it alters the previously published formulas. We also provide a set of sanity checks to ensure that the selected combinations of the parameter values will lead to useful visualizations. The definitions of the formal parameters T, M, W, and A in terms of common (base 10) logarithms are used here and recommended for Gating-ML, because these are relatively intuitive or easily explained to ordinary users. The value of T is an essentially arbitrary choice of units for the top of the data scale and is chosen differently by the manufacturers of different instruments. The standard logicle scale (with A = 0) is defined, so that the most negative value on scale has the same absolute value as the most positive data value in what can be considered the quasilinear region of the scale. The zero data value is at the center of the quasilinear region, and versions differing only in the value of W approach the same logarithmic behavior at the high end. This definition has the desirable feature of making the scale position of large data values essentially the same among logicle displays with different values of W and closely matching a true logarithmic display with the same value for M, which facilitates visual comparison of distributions. While the formulation in these terms is oriented toward production of consistent and readily compared plots for cytometric data sets, logicle transformation can be useful for other purposes, as illustrated by the use of logicle transformed data as input for automated cluster analysis (9). In a few situations, altering the negative data range to be greater or less than the nominal quasilinear region of a standard logicle transformation may be desirable, so we have introduced the formal parameter A to specify additional decades of negative data values. Figure 1 illustrates the effects of different choices of the logicle parameters. As shown in Figure 1D, positive values of A can be used to extend the range of negative data values on scale, however, the added scale range will not be quasilinear and can lead to spurious data peaks like those seen in logarithmic displays (Fig. 1B). If there is simply a need to display more negative data range, increasing W (Fig. 1E compared to 1C) will accomplish this while maintaining quasilinearity for all on-scale negative values. For data that cannot include negative values, it may be advantageous to set A = −W producing a display with no negative range but with zero on scale and with near-linear behavior near zero. Nonzero values of A will change the display scale of large data values whose consistency is one of the desirable features of the standard logicle method. The optional parameter A should not be a routinely adjusted user parameter but should instead be reserved for these special cases. In general, the width parameter W is the only one that should be routinely varied, and the best approach in general is to set it with respect to the most negative relevant values in the data as described in the original paper (2). The parameters should be chosen so that T > 0 and M > 0. In addition, we should always choose 0 ≤ W ≤ M/2. The choice W = M/2 specifies a scale that is essentially linear over the whole range except for a small region of large data values. For situations in which values of W approaching M/2 might be chosen, ordinary linear display scales will usually be more appropriate. The choice W = 0 gives essentially the hyperbolic sine function, sinh x. With the added restriction to a scale range of [0,1], the values of the parameters a, b, c, d, and f can be fixed. The mathematical motivation and derivation for this are discussed in the Supporting Information "Biexponential Functions." The resulting formulas are presented later. Consult the Supporting Information for additional information on the definition and significance of these points. When w = 0, we know that d = b and otherwise d is restricted to the interval (0, b). From these values, the final values of c and f can be computed trivially. As the inverse of the biexponential functions cannot be given in closed form either, to compute scale values for arbitrary data, we must also solve it numerically. Once the two exponentials have been computed, the biexponential function value and the first two derivatives can be computed with only a few more multiplications and additions. Therefore, Halley's method (NR3 p. 463; Ref.5) with cubic convergence is used in the reference implementations. This is possible, because very good initial estimates can be generated using the known properties of the function. If the data value y ≥ f, then we are in the approximately logarithmic part of the function, and ln y is a good approximation. When y < f, a linear approximation using the slope at 0 is used to generate the initial estimate. Evaluation of the biexponential functions using a library exponential function is straightforward with the exception of the neighborhood of x1, that is, close to data value 0. In that region, a naive calculation loses significance by subtracting terms that are nearly equal. Therefore, in that region, we use a Taylor series approximation instead and evaluate it according to the method of NR3 p. 202 (5) to preserve numerical accuracy. Two reference implementations of the logicle scale are provided in each supported language (Supporting Information). The first "Logicle" implementation computes the scale and its inverse to full double precision (numerically accurate to approximately seven parts in 1016). This is, of course, far higher accuracy than needed for practical applications in cytometry, but it serves as a reference against which the accuracy of other implementations can be measured in the compliance tests. A second "FastLogicle" implementation is the version to be used in normal applications, because it is significantly faster to compute and provides suitable precision for real applications. That version uses binary search (NR3 p. 114; Ref.5) and linear interpolation (NR3 p. 117; Ref.5) on an equally spaced table of values of the biexponential function computed as described earlier. We have found the value M = 4.5 (scaling for high data values approximates a 4.5 decade logarithmic scale on the same plot space) to be generally suitable for all fluorescence measurements in flow cytometry, so this is the default value supplied in the reference implementations, if the value is not explicitly given. Java Development Kit, Oracle, Redwood Shores, CA, http://www.oracle.com/technetwork/java/ Eclipse Development Environment, Eclipse Foundation, Ottawa, Ontario, Canada, http://www.eclipse.org/ JUnit Test Framework, JUnit.org, http://www.junit.org/ Visual Studio 2008, Microsoft, Redmond, WA, http://www.microsoft.com/visualstudio/en-us/products/2008-editions Google Test, Google, Mountain View, CA, http://code.google.com/p/googletest/ GNU C and C++ Compilers, Free Software Foundation, Boston, MA, http://gcc.gnu.org/ R, The R Project for Statistical Computing, Wien, Austria, http://www.r-project.org/ Unit testing was performed using JUnit for the Java version and the Google Test framework for the C++ version. The potential dynamic range of signals in a particular measurement condition can be taken as the ratio of the maximum measurement value to the uncertainty in the minimum measurement. In the illustration in Figure 1E, the standard deviation (SD) of the CD19+ population in the PE-CY7 dimension is about 94 units. The top of scale is nominally 262,144. Taking the SD as a measure of the minimum uncertainty, we get a dynamic range of about 2,800 or 3.45 decades. The effective dynamic range in a logicle display can be taken as the ratio of the most compression in the scale (at the top of scale) to the least (at the center of the near-linear zone). For the plots in Figure 1E, this ratio is about 1,500 or 3.18 decades. As data values 94 units apart are well separated in the vicinity of the data zero, this display and dynamic range are clearly adequate for showing all significant features in the data. A rigorous discussion of dynamic range is presented in the Supporting Information "Biexponential Functions." The authors thank Josef Spidlen at the British Columbia Cancer Agency, Canada, Noah Zimmerman at Stanford University and Steven Amerige at SAS Institute for the help in debugging the various versions of the reference implementation and cross compiling with different compilers on different operating systems. The authors thank Josef Spiden, Gaurav Sharma, James Cavenagh and an anonymous skeptical reviewer for helpful comments on the draft manuscript and Eliver EB Ghosn for supplying the data used in Figure 1. Additional Supporting Information may be found in the online version of this article. Please note: The publisher is not responsible for the content or functionality of any supporting information supplied by the authors. Any queries (other than missing content) should be directed to the corresponding author for the article.
Referência(s)