Carta Acesso aberto Revisado por pares

Zooming in on adipocytes: High and deep

2017; Wiley; Volume: 91; Issue: 11 Linguagem: Inglês

10.1002/cyto.a.23269

ISSN

1552-4930

Autores

Minh Doan,

Tópico(s)

Viral Infectious Diseases and Gene Expression in Insects

Resumo

In less than 3 decades, obesity has reached epidemic proportions. Surveys in the United States alone have shown nearly two-thirds of the US population is overweight and 1 in 3 adults are clinically obese 1. Being undeniably associated to many serious health disorders, such as type 2 diabetes, cardiovascular disease, and increased risk for mortality, obesity and its pathogenesis have been the subjects of extensive studies. One of the most prominent features of a fat cell—adipocyte—is the large accumulation of lipid droplets (LD). Well visible by light microscopy as round vacuole-like organelles, LDs have been referred to as fat/oil bodies or adiposomes for decades, and were largely ignored. The discovery of Perilipin, a protein localized on LD surfaces, has later unveiled a complex mechanism in the formation, maintenance, mobilization of LDs 2. These findings have drawn major research interest to the organelle that has a simple and inert appearance yet positioned as a central hub for lipid metabolism and many other physiological and pathological conditions 3. In this light, bioimaging has played a crucial role. Seeing is believing, adipocytes and their lipid content have been scrutinized in many studies by advanced microscopes with increasing sensitivity and resolution. In particular, LDs are readily quantifiable thanks to the relatively large size (0.5 µm to tens of microns). The refracting nature also makes them visible in phase contrast microscopy. Moreover, several hydrophobic dyes are also available for labelling LDs, such as Oil red O, Nile red and BODIPY. With imagery data of adipocytes at hand, researchers then utilize image analysis software to identify individual adipocytes as well as LDs, mitochondria and other subcellular entity. Quantitative measurements for each object are extracted, including intensities, size, shapes, textures, patterns, correlations, proximity, relationships between components, and so forth. The process of adipocyte differentiation and LD formation are then detailed by a vast multiplicity of descriptive features and parameters, hence the widely known term high-content analysis (HCA). Among these features, the metrics of LDs are often used as a means to infer the differentiation and maturation status of adipocytes. However, the quantification of LDs is not as easy as it may look. As pointed out by Bombrun et al. (DOI: 10.1002/cyto.a.23265; this issue, page 1068), the difficulty is rooted from multiple sources. (i) There is a large number of individual LDs per experiment (may reach multi millions)—multiplied with the high-content data produced by a high-throughput microscope—altogether resulting in a very large data set and thus computationally expensive. (ii) LDs often present with different sizes, ranging from very small (few tenth of a micron) to very large (tens of microns), only one to five pixels apart from one another, with high intensity background in between. (iii) There is unavoidable sample-to-sample variation and batch-effect, due to the varied proportion of LD-containing cells and varied intensity of staining across samples. The latter is a known issue commonly seen in in vitro experiments with cells sourced from different human donors. To explore such a complex issue, a single solution is likely insufficient. More and more observable in modern bioimaging, researchers may need to utilize a combination of approaches, spanning across multiple domains including high quality data acquisition, advanced tools in computer vision, efficient data mining and most importantly, practical bioinformatics skills to achieve an answer for the governing biological question. This multidisciplinary navigation was demonstrated by Bombrun et al. (this issue, page 1068), where they set out to individually segment lipid droplets and measure high-content features in a large-scale high-throughput RNAi screen for genes involved in adipogenesis, with cells collected from multiple donors. To this end, they developed a two-step segmentation procedure, in which the gaps between LDs were first enhanced by computing the maximum and minimum surface curvatures. With this filter, the contrast in bright regions is enhanced while variations inside the darker regions are minimized. Then a modification of the white top-hat transformation was used to enhance the difference between the dim droplets and their surrounding in cluttered, noisy images. With such an improved segmentation, individual LDs were accurately identified, boosting the efficiency of downstream HCA. Using K-means clustering of extracted features, the authors achieved a clear discrimination between samples with suppression of Perilipin 1 (positive control) and samples treated with randomized RNAi (negative control), giving rise to a robust methodology to monitor fat cell evolution in a large-scale screening experiment regardless of donor variation. In addition to this approach, it is worth mentioning other paradigm shifts that are rapidly changing modern bioimaging, which in turn may potentially help adipocyte biology research reach a new height. New computer algorithms are continuously delivering more accurate object identification and feature extraction, especially with the use of machine learning. Some image analysis packages introduced interactive learning and segmentation toolboxes with a friendly interface. For example in ilastik, users can build a modular interactive pipeline to train a classifier to classify foreground and background pixels and subsequently facilitate object detection. A few cursor-based annotations are initially needed from the users, and from these ground-truths, pattern of edge, texture and pixel intensity will be “learned” by the software. Finally, the classifier uses this set of algorithmic values to classify pixels in the new instances of images. Recent efforts in computer vision using representation learning have also greatly advanced the field of instance segmentation. Representation learning is a subset of machine learning in which the learning algorithm is fed with raw imagery data and is instructed to discover on its own the representations needed for detection or classification 4. One form of representation-learning methods, termed deep-learning, or more specifically convolutional neural network (CNN), is designed with multiple levels of representation, in which the representation at one level (starting with the raw imagery input) is subsequently transformed into more abstract representations at higher levels one after another. Higher layers of representation amplify features of the input image that are important for discrimination and suppress irrelevant variations. Operating at the pixel level, CNNs map each pixel to a point in feature space so that pixels belonging to the same instance lie close together, hence the term instance segmentation. Deep learning can also be utilized as an unsupervised feature extractor or phenotypic classifier. These approaches have shown impressive results in object detection and quantification, even in complex biomedical images 5, 6. The drawbacks of using supervised machine learning, especially deep learning, are mainly rooted from the fact that these methods are very data-hungry. The training material needs to be sufficiently large and often requires manual annotation. The training process for a large dataset is also computationally expensive. For the former, there are opportunities to use data augmentation strategies or, where applicable, transferring weights from pretrained models. For the latter, cloud-computing may grant researchers access to the needed computing infrastructure. Given the regularity of LD appearance in adipocytes, where LDs present as circular objects with somewhat clear borders in the phase-contrast microscope, machine learning will be beneficial for the detection and quantification of individual droplets, even without the use of staining dyes. There is an increasing awareness in the bioimaging community regarding the proper use of HCA and its high-dimensionality. This issue was addressed in several systematic studies and reviews 7, 8. Image-processing packages can extract hundreds to thousands of features per identified object per channel, such as intensities, shapes, textures, relationships, and so forth, as mentioned before. Typically, most of these features are designed and engineered based on human visual perceptions. These parameters can readily be presented as readouts for intrinsic biological processes or as descriptors of perturbations, which can directly lead to biological interpretations. However, due to this direct link, earlier published HCA studies tend to use only a few features of cells, or only target “hits” that support a given hypothesis and leave a vast number of qualitative and quantitative variables untouched 9. For example, the size and number of LDs are often used as the sole parameters to infer the differentiation stage and type of adipocytes. As illustrated in Figure 1 , this interpretation may overlook certain distinctive phenotypes. For this reason, lipid droplet size and number alone may not be sound phenotypic readouts when the phenotype in question is mixed in a complex co-culture of preadipocytes, brown, beige, and white adipose tissue 10, 11. Diagram depicting the varied lipid contents in different adipocyte subtypes. The size and number of lipid droplets of cell A and B are clearly distinguishable, while that of cell A and C are identical. Thus, using lipid droplet size and number alone may fail to differentiate cell A and C if they are in a mixture. Incorporating more high-content features, such as textures and spatial distribution, may help. Thanks to the intensive discussion regarding this issue, more current studies indeed paid attention to the full feature space of images, and used them as unbiased sources of quantitative information to characterize cells 7. Nevertheless, there is an optimal balance between the number of samples and the number of features. Plenty of samples with too few attributes are observed in traditional low-content high-throughput analyses, while too few samples with too many attributes are seen in an unnecessarily complex HCA of a few hundred cells. In the latter, the quality of phenotyping classification will be affected by the redundant and irrelevant attributes, dubbed as “the curse of dimensionality.” There is a large body of work on machine learning to develop a robust methodology for choosing relevant subsets of features 12, namely feature selection and feature mapping. Using a high quality set of features, cellular objects can be then classified unbiasedly and objectively (as opposed to manual gating) into correct phenotypes in supervised learning, or can be presented as clusters of phenotypes as in unsupervised learning. From the latter approach, previously uncharacterized cell states can be brought to light 13. These developments collectively are particularly well-suited for monitoring changing morphologies in a complex cellular process, such as in adipocyte differentiation. During a period that could be as long as 30 days, an in vitro differentiated adipocyte may go through a continuous spectrum of changes; ranging from a preadipocyte with no lipid droplets and a large intracellular space to a differentiated adipocyte, filled with large lipid droplets and a constrained space for other subcellular elements, including a squeezed nucleus. The morphological switch from preadipocyte to adipocyte could in fact happen subtly and gradually 14, and may involve intermediate phenotypes that have not been well characterized before. The “content analysis” of an adipocyte, therefore, should not be limited to the lipid droplet regions, but should be able to follow the dynamic change of what is left in the intracellular space. To this end, the new advances in computer vision and machine learning would greatly aid scientists to investigate adipogenesis, or conceptually, to monitor any chronic progression. Proxies have been developed to facilitate the implementation of novel methodologies such as machine learning to the common practice of HCA. Beside the continuous advancements of analytic software, in the last few decades, many general-purpose and science-oriented programming languages were introduced with focus on code readability (Python, R, MATLAB, etc.) and are more welcoming for researchers from different fields. Within these environments, libraries and scientific computing frameworks are built and delivered as “packages” with a user friendly application programming interface (API). Similarly to the graphical user interface, an API makes it easier for users to use definitions, calls, functions and routines of the program. In this light, the free/open-source machine learning libraries such as scikit-image, scikit-learn, Tensorflow, Keras, CAFFE; or the commercial image processing and machine learning toolbox from MATLAB are worth mentioning among many others. These suites provide flexible and efficient means for bioinformaticians and developers to further build syntactically simple applications for image and data analysis. Thriving from these fundamental frameworks, many studies have demonstrated excellent in-house applications or workflows that not only serve well for a specific biology question, but could be generalized to be used in broader purposes. In addition, there is a noticeable active trend, in which each study publicly published their programming scripts with detailed instructions and tutorials, itemized actions, and example data. Often lodged on the popular repository-hosting service Github.com, the codes and the workflow can be easily shared, downloaded or co-developed with real-time version control. This has also been exemplified in the current work of Bombrun and coworkers. This encouraging activity not only helps the readers to be able to appreciate the scientific findings of the published works, but also allows readers to quickly implement the techniques in their own studies. Furthermore, thanks to the transparency of the materials, the work can be revisited, reproduced, and the quality and efficacy of the method can be continuously improved with the help of the active community, bringing great benefits to multiple parties. No conflict of interest to declare.

Referência(s)
Altmetric
PlumX