Data visualization for inference in tomographic brain imaging
2019; Wiley; Volume: 51; Issue: 3 Linguagem: Inglês
10.1111/ejn.14430
ISSN1460-9568
AutoresCyril Pernet, Christopher R. Madan,
Tópico(s)Advanced Neuroimaging Techniques and Applications
ResumoTomographic imaging (i.e. magnetic resonance imaging [MRI], positron emission tomography [PET], X-ray computed tomography [CT]) offers a unique window in understanding structure-function relationships in the living brain. Nowadays, it is standard practice to acquire entire brain volumes and perform statistical analyses at each voxel, an approach known as statistical parametric mapping (Friston, Frith, Liddle, & Frackowiak, 1991). Because it is impossible to report the statistical results in every voxel, summary tables and figures are of importance. The choice made by authors to create such figures should however not be driven by aesthetic considerations alone but also driven by the message to convey (Rougier et al., 2014). This is not to say that beautiful figures should not be used, as more appealing figures might in fact help in remembering results (Borkin et al., 2013; Madan, 2015b). At the intersection of psychology, computer vision, graphic design and statistics, there is a field of research that looks at how to represent information and what features are beneficial or harmful in figure designs. For instance, Cleveland and McGill (1983) showed that changing the axis scaling in scatter plots can alter inference on associations between variables. Siegrist (1996) show that using perspective in pie charts, lead to falsely infer magnitude differences because the slices that are closer to the reader appear to be larger than those in the back. In general, there are recommendations for plotting the data rather than summary statistics as those summary values can be obtained with very different distributions which can preclude the use of some statistical tests (see e.g. Anscombe, 1973, for correlations or Weissgerber, Milic, Winham, & Garovic, 2015, for bar graphs). Here we discuss information provided in figures when presenting tomographic data results. Many proposals have already been made by others, what we offer is a principled way chose among those proposals and apply them. A review of articles using tomographic techniques published between January 2016 and June 2018 in the European Journal of Neuroscience (N = 30 – https://github.com/CPernet/MRI_FaceData_Wakeman-Henson/blob/master/DataViz/EJN_paper_review.csv) shows that four broad types of messages are obtained from statistical parametric maps: (a) demonstrating an effect over the whole brain or in specific tissues (e.g. grey matter); (b) showing the anatomical location of an effect; (c) revealing an hemispheric asymmetry for a condition or stimulus; (d) demonstrating the involvement of a set of areas or networks in a given task or between groups or conditions. By associating the message with the design, it appears clearly that slices are preferred to display the anatomical location of an effect (5.3 slice/render ratio), while displaying a set of areas and networks results are less clearly associated with a design (2.5 slice/render ratio) although still dominated by displaying slices. The description of the anatomical location of an effect was described 19 times, with 16 figures using slices and three using renders. The involvement of a set of areas was described 22 times with 15 figures using slices only, and six with renders (three of them being both renders and slices). Among all 30 studies, only two showed the raw statistical data (i.e. unthresholded map), and 13 (i.e. 43%) did not produce any data plots associated with the data. Among the 17 studies plotting data associated to maps, three showed inconsistency in that mapping (i.e. do a plot for one effect but not another) and 10 plotted significant results only, that is, only 23% of all studies plotted results independent of statistical significance. In the following, we propose graphical designs for each type of message, adopting the three principles proposed by Cleveland & McGill, 1985. First, for readers to appreciate where an effect is, slices or renders should be used depending on the message (principle 1: detection). Second, to improve inference, the assemblage of visual information must be performed to create a unified representation of the results (principle 2). Third, to convey information about the strength of effects, accurate colour scales and plots must be used (principle 3). Proposals are illustrated using data from Wakeman & Henson, 2015 in which 16 participants view famous, unfamiliar and scrambled faces. Each image was repeated twice (immediately in 50% of cases and 5–10 stimuli apart for the other 50%) and subjects pressed one of the two keys with either their left or right index finger indicating how symmetric they regarded each image relative to the average. Here, only the main effects of face recognition (famous faces + unfamiliar faces > scrambled faces) are investigated, independently of repetition levels. Resources necessary to process raw data and generate the figures in this article are available at https://github.com/CPernet/MRI FaceData Wakeman-Henson and https://www.github.com/cMadan/MRIdataviz. All figures are also available under CCBY licence at data share https://doi.org/10.7488/ds/2516 (Pernet & Madan, 2019). The first principle, detection, refers to the ability for readers to detect where effects are. Presenting an SPM using multiple views is, therefore, better than any single view approach (Madan,2015a; Ruisoto, Juanes, Contador, Mayoral, & Prats-Galino, 2012). We can, however, distinguish two general cases that will drive the design: presenting networks/sets of areas involved in a task vs. illustrating the precise anatomical involvement of an area. In the first case, readers must be able to detect all areas, and in the second case, they must be able to detect the spatially circumscribed area under scrutiny. Our mini review shows that slices are typically shown even when the message is about sets of areas, thus failing to show the distribution of activity throughout the brain. Figure 1 illustrates this using the thresholded map for the contrast famous faces + unfamiliar faces > scrambled faces. In the slice view, we can clearly see bilateral fusiform activity. The surface view also shows the extent of this activity along the fusiform gyrus, particularly on the inflated surface. This surface view provides, in addition, some indication of the distribution of activity. Considerations should, therefore, be taken to decide if a 'regular' grey matter (pial) surface or an inflated surface better conveys the cluster extent. Indeed, the presentation of multiple image display techniques ('fused images') has been shown to aid in data interpretation (e.g. increase in location agreement among clinicians) and comprehension (e.g. relating lesions to an activation pattern – see Stokking, Zubal, & Viergever, 2003, for a review). The use of inflated or pial surfaces can be particularly relevant if some clusters are sufficiently within a sulcus to be not visible on a pial surface, such as the occipital clusters in Figure 1. The glass brain representation gives the most complete depiction of 'active' areas but makes it difficult to localize the precise location of the activity. When the message is about networks or the involvement of many areas, we thus recommend using a glass brain (Madan, 2015a), preferably shown from two viewing angles with a slight offset to aid in the interpretation of overlapping clusters and the perception of depth. If space is available, this may be complemented with slices when subcortical structures are involved as only presenting a glass brain view makes it difficult for the reader to determine the depth of the activation cluster. To illustrate the anatomical location of an effect, slices and (orthogonal) cross-sectional views are recommended. For deep structures, additional three-dimensional representations may also be useful (Ruisoto et al., 2012) to provide information about the volume of activation relative to anatomical structures. This principle is illustrated in Figure 2 with all areas significantly activated by stimuli (simple effects) displayed on a render, thus creating a representation of the overall pattern of activation for this task. In contrast to these large effects, localized effects were observed for the contrast of interest famous faces + unfamiliar faces > scrambled faces and are thus displayed on slices. For group studies, we recommend using the average of normalized participants' T1 volumes as an underlay to more accurately portray the anatomical locations of effects relative to structures (and incidentally show the alignment of anatomical structures among subjects). For instance, results from Wakeman and Henson (2015) shown in Figures 2 and 4 are using such average. By using the average T1, one can appreciate better the amount of smoothness in registration and true brain coverage. When this approach is impractical, for example the study involves a between-groups analysis where anatomical differences are expected (such as young versus older adults), we recommend using the ICBM 2009c non-linear asymmetric structural volume (Fonov et al., 2011) ('mni_icbm152_t1_tal_nlin_asym_09c) or the study template, if one was created. When considering individual participant's activation (fMRI, PET), results must be presented on their own structural images, and never on a 'standard' brain as it leads to inaccurate reporting of the anatomy (Devlin & Poldrack, 2007). As shown on Figure 3, for some participants differences in activation locations between the subject anatomy and the template are small (e.g. participant 15) and for others, the anatomy is very different from the template (e.g. participant 3). It should finally be noted that figures do not have to be static. No doubt science communication is moving away from paper and Portable Document Format (pdf), and we encourage the community to embrace interactive figures using visualization software such as Papaya (Mango Team, 2016), NiftyView (Deng, 2016), BrainBrowser (Sherif, Kassis, Rousseau, Adalat, & Evans, 2015), BrainNetBrowser (Xia, Wang, & He, 2013), PySurfer (Waskom, Gramfort, Burns, Luessi, & Larson, 2016) or Pycortex (Gao, Huth, Lescroart, & Gallant, 2015). NeuroVault (Gorgolewski et al., 2015) also provides a useful demonstration of such visualizations where raw statistical maps can be seen and exchanged (see our results from Figure 3 at: https://neurovault.org/collections/4319/). The second principle for graphic design is the assemblage of visual information. The goal was to provide readers with a visual summary of the different information available to help with inference and interpretation. As discussed in Poldrack et al. (2017), claims of absence of effect and selective activations as well as usage of reverse inference are common in neuroimaging, but are often wrong because they require additional quantitative testing. We contend that these errors partly relate to which information and how information is displayed in figures and that better figure designs can help with inference. Absence of statistical significance is not an absence of effect (Killeen, 2005) and in the absence of significance one only fails to reject the null hypothesis (Lakens, 2017; Pernet, 2017). While the error is common in behavioural sciences, it becomes the norm when describing results from statistical parametric maps: if there is no activation (above a statistical threshold signal), one typically infers that there is no effect. Unless using equivalence testing or Bayesian statistics, it is however impossible to infer that a given experiment or comparison did not lead to activation in a given region. For instance, results from Wakeman and Henson (2015) show a significant activation for faces compared to scrambled faces in the medial fusiform gyri (Figures 2 and 4) but that does not indicate that other regions are not also activated in response to face stimuli, or even more activated by intact faces than scrambled faces. In fact, strong but non-significant effects can also be seen more laterally. Reporting all effects, no matter the level of significance is the most effective way to convince readers of the results. Showing raw (unthresholded) statistical maps is thus a step in that direction (Jernigan, Gamst, Fennema-Notestine, & Ostergaard, 2003). We recommend here to go even further and plot and test effects for all areas expected to be a priori activated, given the experimental hypotheses. Thanks to the vast literature on face perception, we can generate spatial predictions using meta-analysis engines such as NeuroSynth (Yarkoni, Poldrack, Nichols, Van Essen, & Wager, 2011). Here, we can predict bilateral activations in the posterior middle occipital gyri, lateral fusiform, parahippocamplal regions, amygdalae, temporo-parietal junctions and right inferior frontal gyrus (reverse inference map thresholded with a minimal extent of 20 voxels from http://www.neurosynth.org/analyses/terms/face/). As our whole brain analysis did not reveal significant differences between intact faces and scrambled faces in these regions, one might infer that there is no effect. Statistical testing in these a priori ROIs, however, shows that this would be the wrong inference to make, as differences can be observed. As shown in Figure 2, the Bayesian bootstrap of the mean reveals stronger activations for faces than scrambled faces in the right (lateral) fusiform gyrus and left and right middle occipital gyri (i.e. highest density intervals of the difference did not include 0, see Table 1). Because of expectations (i.e. hypotheses) about where effects should be localized, and that many studies found effects or differences at these locations, reporting results using such priors are worthwhile as this allows comparing results across studies and reduce false inference. A practical aspect to consider when using a priori ROIs is how to generate them. When possible, using NeuroSynth or GingerALE (Eickhoff et al., 2009) is recommended as these meta-analysis tools can generate unbiased ROI. For investigators interested in checking across the whole brain where are 'in limbo' areas (above baseline but also under the statistical threshold of significance), sandwich estimator can be used (de Hollander, Wagenmakers, Waldorp, & Forstmann, 2014) testing if the difference relative to significant areas is itself significant (Gelman & Stern, 2006). Because of the non-stationary spatial nature of baseline activity, summary statistics (average values, T or F values, etc) displayed on slices and renders can have widely different physiological interpretations, and it is, therefore, essential to plot results for all a priori ROI but also all regions seen as significant. It may, of course, be impractical to have all plots in the core of a publication when many results are observed, but it is easy to provide this information as supplementary material, along with csv files or raw data behind the plots. Consider for instance a positive contrast (i.e. the mean value is bigger than 0). Such result can be obtained from three configurations: (a) all conditions are superior or equal to 0 (e.g. left/right fusiform gyri), (b) all conditions are inferior or equal to 0 and (c) conditions vary around baseline. The two first scenarios are 'easily' interpretable, while the last case is much harder to understand. This is well illustrated in Figure 2, with the contrast famous faces + unfamiliar faces > scrambled faces. The contrast values in significant areas have similar distributions, yet the right MOG has a completely different pattern of response. We, therefore, recommend to systematically show data points (e.g. beta estimates or percentage signal change) for each condition along with the summary statistics of contrasts (typically the mean, but not always). For these plots, showing means and standard deviations using bar graphs is inadequate (Rousselet, Foxe, & Bolam, 2016; Weissgerber et al., 2015), and box plots or violin plots along with data scatter are recommended. Similarly, reconstructed hemodynamic responses can be plotted if they convey enough information about variance across subjects. The perception of a lack of effect in areas not significantly activated leads to incorrect inferences about the selectivity of significantly activated areas, an inferential error known as the 'imager fallacy' (Henson, 2005). Engel and Burton (2013) showed that over 80% of naive readers are making such error when looking at a thresholded SPM. What we detect as face-selective depends on both the task and the baseline used (Stark & Squire, 2001), here scrambled faces, and on the statistical threshold used. The issue of selectivity or specificity of activations in the brain has been hotly debated (see, for example, Pernet, Schyns, & Demonet, 2007) but all agree that it requires statistical testing and cannot be inferred merely from showing a qualitative different activation pattern. What a qualitatively different pattern of activation between stimuli or conditions does allow one to infer (although it would need actual statistical testing of the interaction regions * stimuli or conditions) is that information processing differs in at least one function (function-to-structure deduction as opposed to structure-to-function induction; Henson, 2005). We, therefore, recommended showing raw statistical maps (Jernigan et al., 2003), assembling results of simple effects and contrast of interests as shown in Figure 2. This allows addressing, at least visually, issues of the absence of effects, selectivity and qualitative difference. In Figure 4, the raw maps of simple effects allow inferring that information processing was similar across all three conditions because we have similar activation pattern. Sharing such unthresholded statistical maps is also highly encouraged, using online repositories such as NeuroVault (Gorgolewski et al., 2015). The result from the contrast of interest is also shown unthresholded thus addressing the issue of non-significant areas. It is however also important to point at where the evidence supports the existence of an effect, thus highlighting significant areas (Allen, Erhardt, & Calhoun, 2012), here using contours (this can be achieved easily using tools such as nanslice; Wood, 2018). The same way as spatial selectivity can wrongly be inferred from the absence of statistically significant results, hemispheric asymmetry is often inferred from thresholded maps. As for selectivity, it is recommended to statistically test for hemispheric differences going beyond the single level of activation by computing lateralization indexes based on bootstrapped lateralization curves (i.e. using the size and intensity of clusters across all thresholds as, for example implemented in the LI toolbox, Wilke & Lidzba, 2007). Here, when testing for fusiform activation asymmetry, individual conditions/stimuli were right lateralized (95% CI famous faces [−0.55, −0.14], unfamiliar faces [−0.45, −0.06], scrambled faces [−0.54, −0.13]) while visually, maps did not allow to see this pattern (Figure 5). When considering the whole brain, only famous faces [−0.38, −0.08] and scrambled faces [−0.37, −0.06] show right lateralization (unfamiliar faces [−0.029, 0.022]). To best understand the pattern of lateralization across stimuli/conditions, we also recommend using paired observations on scatter plots rather than (or in addition to) box plots or violin plots (Rousselet et al., 2016). The third principle in graphical design is the use of 'accurate' colour scales. Although imaging researchers know to be cautious when conducting their analyses to account for relevant nuisance regressors and determining cluster thresholds, they often pay little attention in selecting colours to visualize the results of their imaging study. When the message is about where active regions are, a single colour can be used. When information about the spatial distribution of statistical values is also of interest, colour palettes (scales) should be used. Such palettes must appropriately convey the underlying data and not introduce perceptual biases. This topic has been investigated at length within other fields of study including geography (Brewer, Hatchard, & Harrower, 2003; Light & Bartlein, 2004; Thyng, Greene, Hetland, Zimmerle, & DiMarco, 2016) and astronomy (Green, 2011) and brain imagers should also be considering this issue. Colours in digital images are often generated as combinations of red, green and blue (RGB) intensities. This is, however, not how colours are perceived by humans. An alternative colour space, CIELAB, has been developed to correspond to the human perceptual system. In 1976, the International Color Consortium (a.k.a. Commission Internationale de l'Eclairage, CIE) defined a colour space perceptually uniform that relies on luminance (L*), red-green (A*, ~550–700 nm wavelength) and blue-yellow (B*, ~400–550 nm wavelength) colours (International Color Consortium, 2006). The critical factor is that changes in luminance are better perceived than hue to reflect changes in magnitude (Cleveland & McGill, 1985) and that averaging RGB values does not linearly correspond to changes in luminance. 'Traditional' colour map used in brain imaging is not informed by this and instead lead to distortions in how colour intensities are perceived and interpreted (Figure 6). Some colour maps have previously been developed with brain imaging in mind, for example, Ridgway (2009), but it has saturation/luminance issues as most other maps. More recently, scientists have generally become more aware of this colour perception artefact with the change of default colour map in some software packages to veridic (Smith & van der Walt, 2015) or parula (Edens, 2014). The luminance function of many sequential, diverging and rainbow colour maps is shown in Appendix S1. For more detailed discussions of luminance effects in colour maps, see Borland and Taylor (2007) and Niccoli (2012). Using the method described in Kovesi (2015), we have developed corrected colour maps now available as .mat (implemented by SPM via spm_colourmap.m),.csv, .cmap (implemented in FSLeyes 0.26.1), .lut (for MRIcroN, and implemented as .clut in MRICroGL v12): https://github.com/CPernet/brain colours, as demonstrated in Figure 7. At the bottom of the figure is shown the difference between the common maps and the redesigned, corrected ones. On a linear scale such as hot, the corrected maps show less saturation within clusters, leading to better appreciate spatial variations, as best seen for the right frontal cluster. For diverging maps, the corrected maps show better the differences in space between positive and negative values, like here for negative values in the visual cortex or the right anterior fusiform gyrus. Linear colour scales (e.g. hot or BGY) are ideally suited for continuous positive or negative values (e.g. contrast T-maps) while diverging colour scales (e.g. NIH, BWR) are better suited for continuously negative to positive value maps (e.g. contrast F-maps), but it is essential to make them symmetric as to have the mid-luminance value reflecting the 0 value in the data. For this reason, we have added the CET-D1 and CET-D7 to the repository (referred to as Blue-White-Red (BWR) and Blue-Grey-Yellow (BGY) in 7), the latter one having the advantage of having no perceptual dead-spot at the centre. Cyclic colour maps are ideal to display information about angles as in retinotopic mapping or fibre orientation (see Table 2 for a summary of designs). Other colour maps such as rainbow or spectrum should be avoided as they cycle through luminance. Finally, when using 3D renders, because lighting is used the give a three-dimensional aspect to the image, it interferes with the luminance of colour maps and isoluminant colour maps (also available in the repository) or single colour should be used. As inferential errors relate mostly to local effects (comparison of neighbouring regions, see above), this is, however, less problematic. Although the focus here is on visualizing tomographic mapping, these principles apply generally to heat maps (also see Gehlenborg & Wong, 2012) and should be of interest to all brain imagers, with other possible applications for magneto- and electroencephagraphy SPM, scalp maps and source reconstruction figures. Furthermore, it may be desirable to show categorical data rather than continuous, for example, when showing several anatomical regions-of-interest or graphs of activations in different task conditions. Here we suggest using the distinctive colour sets proposed by Brewer et al. (2003), Wong (2011), or Kelly (1965). See Appendix S2 for examples and details for these colours. SPM must be displayed in non-misleading ways. Our recommendations are simple to adopt and, we believe, should help in making further inference from the results. In general, use 3D renders and glass brains for sets of regions and networks; use slices for the precise anatomical location of an effect. Assemble visual information as to help with spatial inference, combining simple effects with contrast maps and use unthresholded maps highlighting significant areas (Allen et al., 2012). Carefully choose colour maps to reflect the magnitude of effects. Plot data for all a priori ROIs regardless of significance and plot data (simple effects and effects of interest) for each region declared significant during analyses. The authors acknowledge Dr Guillaume Rousselet for comments on an early version of the manuscript. The authors declare no conflict of interest. CP and CM contributed equally to the concept and writing up of the article. CP created the new colour maps and analysed the fMRI data. Please note: The publisher is not responsible for the content or functionality of any supporting information supplied by the authors. Any queries (other than missing content) should be directed to the corresponding author for the article.
Referência(s)