Artigo Acesso aberto Revisado por pares

Deep-Learning–Driven Quantification of Interstitial Fibrosis in Digitized Kidney Biopsies

2021; Elsevier BV; Volume: 191; Issue: 8 Linguagem: Inglês

10.1016/j.ajpath.2021.05.005

ISSN

1525-2191

Autores

Yi Zheng, Clarissa A. Cassol, Saemi Jung, Divya Veerapaneni, Vipul C. Chitalia, Kevin Ren, Shubha S. Bellur, Peter Boor, Laura Barisoni, Sushrut S. Waikar, Margrit Betke, Vijaya B. Kolachalama,

Tópico(s)

Radiomics and Machine Learning in Medical Imaging

Resumo

Interstitial fibrosis and tubular atrophy (IFTA) on a renal biopsy are strong indicators of disease chronicity and prognosis. Techniques that are typically used for IFTA grading remain manual, leading to variability among pathologists. Accurate IFTA estimation using computational techniques can reduce this variability and provide quantitative assessment. Using trichrome-stained whole-slide images (WSIs) processed from human renal biopsies, we developed a deep-learning framework that captured finer pathologic structures at high resolution and overall context at the WSI level to predict IFTA grade. WSIs (n = 67) were obtained from The Ohio State University Wexner Medical Center. Five nephropathologists independently reviewed them and provided fibrosis scores that were converted to IFTA grades: ≤10% (none or minimal), 11% to 25% (mild), 26% to 50% (moderate), and >50% (severe). The model was developed by associating the WSIs with the IFTA grade determined by majority voting (reference estimate). Model performance was evaluated on WSIs (n = 28) obtained from the Kidney Precision Medicine Project. There was good agreement on the IFTA grading between the pathologists and the reference estimate (κ = 0.622 ± 0.071). The accuracy of the deep-learning model was 71.8% ± 5.3% on The Ohio State University Wexner Medical Center and 65.0% ± 4.2% on Kidney Precision Medicine Project data sets. Our approach to analyzing microscopic- and WSI-level changes in renal biopsies attempts to mimic the pathologist and provides a regional and contextual estimation of IFTA. Such methods can assist clinicopathologic diagnosis. Interstitial fibrosis and tubular atrophy (IFTA) on a renal biopsy are strong indicators of disease chronicity and prognosis. Techniques that are typically used for IFTA grading remain manual, leading to variability among pathologists. Accurate IFTA estimation using computational techniques can reduce this variability and provide quantitative assessment. Using trichrome-stained whole-slide images (WSIs) processed from human renal biopsies, we developed a deep-learning framework that captured finer pathologic structures at high resolution and overall context at the WSI level to predict IFTA grade. WSIs (n = 67) were obtained from The Ohio State University Wexner Medical Center. Five nephropathologists independently reviewed them and provided fibrosis scores that were converted to IFTA grades: ≤10% (none or minimal), 11% to 25% (mild), 26% to 50% (moderate), and >50% (severe). The model was developed by associating the WSIs with the IFTA grade determined by majority voting (reference estimate). Model performance was evaluated on WSIs (n = 28) obtained from the Kidney Precision Medicine Project. There was good agreement on the IFTA grading between the pathologists and the reference estimate (κ = 0.622 ± 0.071). The accuracy of the deep-learning model was 71.8% ± 5.3% on The Ohio State University Wexner Medical Center and 65.0% ± 4.2% on Kidney Precision Medicine Project data sets. Our approach to analyzing microscopic- and WSI-level changes in renal biopsies attempts to mimic the pathologist and provides a regional and contextual estimation of IFTA. Such methods can assist clinicopathologic diagnosis. Renal biopsy is an integral part of clinical work-up for patients with several kidney diseases,1Amann K. Haas C.S. What you should know about the work-up of a renal biopsy.Nephrol Dial Transpl. 2006; 21: 1157-1161Crossref PubMed Scopus (44) Google Scholar as it provides diagnostic and prognostic information that guides treatment. Despite such integral clinical use, current assessment of renal biopsy suffers from some limitations.2Farris A.B. Alpers C.E. What is the best way to measure renal fibrosis?: a pathologist's perspective.Kidney Int Suppl (2011). 2014; 4: 9-15Abstract Full Text Full Text PDF PubMed Scopus (59) Google Scholar Evaluation of clinically relevant pathologic features, such as the amount of interstitial fibrosis and tubular atrophy (IFTA), an important prognostic indicator, is based mainly on visual estimation and semiquantitative grading and hence may not reveal relationships that are not immediately evident using compartmentalized approaches.3Farris A.B. Adams C.D. Brousaides N. Della Pelle P.A. Collins A.B. Moradi E. Smith R.N. Grimm P.C. Colvin R.B. Morphometric and visual evaluation of fibrosis in renal biopsies.J Am Soc Nephrol. 2011; 22: 176-186Crossref PubMed Scopus (155) Google Scholar Such estimates do not capture finer details or heterogeneity across an entire slide, and therefore may not be optimal for analyzing renal tissues with complex histopathology. These aspects underline the need for leveraging advances in digital pathology and developing modern data analytic technologies, such as deep learning (DL), for comprehensive image analysis of kidney pathology. DL techniques that utilize digitized images of biopsies are increasingly considered to facilitate the routine workflow of a pathologist. There has been a surge of publications showcasing DL applications in clinical medicine and biomedical research, including those in nephrology and nephropathology.4Becker J.U. Mayerich D. Padmanabhan M. Barratt J. Ernst A. Boor P. Cicalese P.A. Mohan C. Nguyen H.V. Roysam B. Artificial intelligence and machine learning in nephropathology.Kidney Int. 2020; 98: 65-75Abstract Full Text Full Text PDF PubMed Scopus (29) Google Scholar, 5Barisoni L. Lafata K.J. Hewitt S.M. Madabhushi A. Balis U.G.J. Digital pathology and computational image analysis in nephropathology.Nat Rev Nephrol. 2020; 16: 669-685Crossref PubMed Scopus (54) Google Scholar, 6Sealfon R.S.G. Mariani L.H. Kretzler M. Troyanskaya O.G. Machine learning, the kidney, and genotype-phenotype analysis.Kidney Int. 2020; 97: 1141-1149Abstract Full Text Full Text PDF PubMed Scopus (13) Google Scholar, 7Saez-Rodriguez J. Rinschen M.M. Floege J. Kramann R. Big science and big data in nephrology.Kidney Int. 2019; 95: 1326-1337Abstract Full Text Full Text PDF PubMed Scopus (34) Google Scholar, 8Niel O. Bastard P. Artificial intelligence in nephrology: core concepts, clinical applications, and perspectives.Am J Kidney Dis. 2019; 74: 803-810Abstract Full Text Full Text PDF PubMed Scopus (41) Google Scholar, 9Santo B.A. Rosenberg A.Z. Sarder P. Artificial intelligence driven next-generation renal histomorphometry.Curr Opin Nephrol Hypertens. 2020; 29: 265-272Crossref PubMed Scopus (11) Google Scholar Specifically, DL techniques, such as convolutional neural networks, have been widely used for the analysis of histopathologic images. In the context of kidney diseases, researchers have been able to produce highly accurate methods to evaluate disease grade, segment various kidney structures, and predict clinical phenotypes.10Kannan S. Morgan L.A. Liang B. Cheung M.G. Lin C.Q. Mun D. Nader R.G. Belghasem M.E. Henderson J.M. Francis J.M. Chitalia V.C. Kolachalama V.B. Segmentation of glomeruli within trichrome images using deep learning.Kidney Int Rep. 2019; 4: 955-962Abstract Full Text Full Text PDF PubMed Scopus (63) Google Scholar, 11Kolachalama V.B. Singh P. Lin C.Q. Mun D. Belghasem M.E. Henderson J.M. Francis J.M. Salant D.J. Chitalia V.C. Association of pathological fibrosis with renal survival using deep neural networks.Kidney Int Rep. 2018; 3: 464-475Abstract Full Text Full Text PDF PubMed Scopus (67) Google Scholar, 12Marsh J.N. Matlock M.K. Kudose S. Liu T.C. Stappenbeck T.S. Gaut J.P. Swamidass S.J. Deep learning global glomerulosclerosis in transplant kidney frozen sections.IEEE Trans Med Imaging. 2018; 37: 2718-2728Crossref PubMed Scopus (67) Google Scholar, 13Chagas P. Souza L. Araujo I. Aldeman N. Duarte A. Angelo M. Dos-Santos W.L.C. Oliveira L. Classification of glomerular hypercellularity using convolutional features and support vector machine.Artif Intell Med. 2020; 103: 101808Crossref PubMed Scopus (26) Google Scholar, 14Hermsen M. de Bel T. den Boer M. Steenbergen E.J. Kers J. Florquin S. Roelofs J. Stegall M.D. Alexander M.P. Smith B.H. Smeets B. Hilbrands L.B. van der Laak J. Deep learning-based histopathologic assessment of kidney tissue.J Am Soc Nephrol. 2019; 30: 1968-1979Crossref PubMed Scopus (109) Google Scholar, 15Jayapandian C.P. Chen Y. Janowczyk A.R. Palmer M.B. Cassol C.A. Sekulic M. Hodgin J.B. Zee J. Hewitt S.M. O'Toole J. Toro P. Sedor J.R. Barisoni L. Madabhushi A. Development and evaluation of deep learning-based segmentation of histologic structures in the kidney cortex with multiple histologic stains.Kidney Int. 2020; 99: 86-101Abstract Full Text Full Text PDF PubMed Scopus (31) Google Scholar, 16Uchino E. Suzuki K. Sato N. Kojima R. Tamada Y. Hiragi S. Yokoi H. Yugami N. Minamiguchi S. Haga H. Yanagita M. Okuno Y. Classification of glomerular pathological findings using deep learning and nephrologist-AI collective intelligence approach.Int J Med Inform. 2020; 141: 104231Crossref PubMed Scopus (26) Google Scholar, 17Ginley B. Lutnick B. Jen K.Y. Fogo A.B. Jain S. Rosenberg A. Walavalkar V. Wilding G. Tomaszewski J.E. Yacoub R. Rossi G.M. Sarder P. Computational segmentation and classification of diabetic glomerulosclerosis.J Am Soc Nephrol. 2019; 30: 1953-1967Crossref PubMed Scopus (77) Google Scholar, 18Bouteldja N. Klinkhammer B.M. Bulow R.D. Droste P. Otten S.W. Freifrau von Stillfried S. Moellmann J. Sheehan S.M. Korstanje R. Menzel S. Bankhead P. Mietsch M. Drummer C. Lehrke M. Kramann R. Floege J. Boor P. Merhof D. Deep learning-based segmentation and quantification in experimental kidney histopathology.J Am Soc Nephrol. 2021; 32: 52-68Crossref PubMed Scopus (23) Google Scholar Although this body of work is highly valuable, almost all of it focuses on analyzing high-resolution whole-slide images (WSIs) by binning them into smaller patches (or tiles) or resizing the images to a lower resolution, and associating them with various outputs of interest. These techniques have various advantages and limitations. While the patch-based approaches maintain image resolution, analyzing each patch independently cannot preserve the spatial relevance of that patch in the context of the entire WSI. In contrast, resizing the WSI to a lower resolution can be a computationally efficient approach but may not allow one to capture the finer details present within a high-resolution WSI. The goal of this study was to develop a computational pipeline that can process WSIs to accurately capture the IFTA grade. The nephropathologist's approach to grading the biopsy slides under a microscope were emulated. A typical workflow by the expert involves manual operations, such as panning as well as zooming in and out of specific regions on the slide to evaluate various aspects of the pathology. In the zoom out assessment, pathologists review the entire slide and perform global or WSI-level evaluation of the kidney core. In the zoom in assessment, they perform in-depth, microscopic evaluation of local pathology in the regions of interest. Both these assessments allow them to comprehensively assess the kidney biopsy, including estimation of IFTA grade. We hypothesized that a computational approach based on DL would mimic the process that nephropathologists use when evaluating the kidney biopsy images. Using WSIs and their corresponding IFTA grades from two distinct cohorts, the following objectives were addressed. First, the framework needs to process image subregions (or patches) and quantify the extent of IFTA within those patches. Second, the framework needs to process each image patch in the context of its environment and assess IFTA on the WSI. A computational pipeline based on deep learning was developed that can incorporate patterns and features from the local patches along with information from the WSI in its entirety to provide context for the patches. Through this combination of patch- and global-level data, the model was designed to accurately predict IFTA grade. An international team of practicing nephropathologists evaluated the digitized biopsies and provided the IFTA grades. The WSIs and their corresponding IFTA grades were used to train and validate the DL model. The DL model was also compared with a modeling framework based on traditional computer vision and machine learning that uses image descriptors and textural features. The performances of the DL model and identified image subregions that are highly associated with the IFTA grade are reported. De-identified WSIs of trichrome-stained kidney biopsies of patients submitted to The Ohio State University Wexner Medical Center (OSUWMC) were obtained. Renal biopsy as well as patient data collection, staining, and digitization followed protocols approved by the Institutional Review Board at OSUWMC (study number: 2018H0495) (Table 1). De-identified WSIs were also obtained from the following recruitment sites of the Kidney Precision Medicine Project (KPMP): Brigham and Women's Hospital, Cleveland Clinic, Columbia University, Johns Hopkins University, and University of Texas Southwestern, Dallas. KPMP is a multiyear project funded by the National Institute of Diabetes and Digestive and Kidney Diseases with the purpose of understanding and finding new ways to treat chronic kidney disease and acute kidney injury. Race/ethnicity information was directly obtained from the OSUWMC records and the KPMP website.Table 1Data from The Ohio State University Wexner Medical CenterDescriptionValueUnitsPatients64nWhole-slide images67nAge (0–9, 10–19, 20–29, 30–39, 40–49, 50–59, 60–69, 70–79, 80–89) (binned)∗Age was unavailable on four patients.2, 1, 12, 4, 10, 9, 17, 4, 1YearsSex (males, females)34, 30nPatients per ethnicity (White, Black, others, unknown)35, 10, 4, 15nCreatinine, median (range)†Creatinine values were unavailable on 11 patients.1.5 (0.3–10.9)mg/dLProteinuria, median (range)‡Proteinuria values were unavailable on 13 patients.4 (0.5–22)g/dayThe cases obtained from The Ohio State University Wexner Medical Center are shown. A single trichrome-stained biopsy slide was digitized for each patient.∗ Age was unavailable on four patients.† Creatinine values were unavailable on 11 patients.‡ Proteinuria values were unavailable on 13 patients. Open table in a new tab The cases obtained from The Ohio State University Wexner Medical Center are shown. A single trichrome-stained biopsy slide was digitized for each patient. All WSIs were uploaded to a secure, web-based software (PixelView; deepPath, Inc., Boston, MA). C.A.C. served as the group administrator of the software account and provided separate access to the WSIs to the other nephropathologists (K.R., S.S.B., L.M.B., and P.B.), who were assigned as users to the group account. This process allowed each expert to independently evaluate the digitized biopsies. The KPMP WSIs and associated clinical data were obtained following review and approval of the Data Usage Agreement between KPMP and Boston University (Table 2 and Supplemental Table S1). All methods were performed according to federal guidelines and regulations. Renal tissues consisted of needle biopsy samples from biopsies received at OSUWMC and KPMP participants. All OSUWMC biopsies were scanned using a WSI scanner [Aperio (Leica Biosystems, Wetzlar, Germany) or NanoZoomer (Hamamatsu, Hamamatsu City, Japan)] at ×40 apparent magnification, resulting in WSIs with a resolution of 0.25 μm per pixel (Supplemental Figure S1). All the WSIs from KPMP were generated by digitizing renal biopsies using Aperio AT2 high-volume scanners (Leica Biosystems) at ×40 apparent magnification with resolution 0.25 μm per pixel (Figure 1). More details on the pathology protocol can be obtained directly from the KPMP website (https://www.kpmp.org, last accessed May 1, 2021). The Aperio-based WSIs were obtained in the SVS format, and the Hamamatsu-based WSIs were obtained in NDPI format.Table 2Data from the Kidney Precision Medicine ProjectDescriptionValueUnitsParticipants14nWhole-slide images28nAge (30–39, 40–49, 50–59, 60–69, 70–79) (binned)4, 0, 1, 7, 2YearsSex (males, females)7, 7nPatients per ethnicity (White, Black, others, unknown)10, 3, 0, 1nBaseline eGFR (<15, 15–29, 30–59, 60–89, ≥90) (binned)0, 1, 7, 3, 3mL/minute per 1.73 m2Proteinuria ( 50% (severe).19Srivastava A. Palsson R. Kaze A.D. Chen M.E. Palacios P. Sabbisetti V. Betensky R.A. Steinman T.I. Thadhani R.I. McMahon G.M. Stillman I.E. Rennke H.G. Waikar S.S. The prognostic value of histopathologic lesions in native kidney biopsy specimens: results from the Boston Kidney Biopsy Cohort Study.J Am Soc Nephrol. 2018; 29: 2213-2224Crossref PubMed Scopus (63) Google Scholar The final IFTA grades were computed by performing majority voting on the grades obtained from each nephropathologist. The fibrosis scores for the KPMP data set were obtained directly from the study investigators and converted to IFTA grades using the same criterion. The derived IFTA grades from both data sets were used for further analysis. Our DL architecture is based on combining the features learned at the global level of the WSI along with the ones learned from local high-resolution image patches from the WSI (Figure 2A). Similar DL architectures have been recently applied to computer vision-related tasks.20Chen W. Jiang Z. Wang Z. Cui K. Qian X. Collaborative Global-Local Networks for Memory-Efficient Segmentation of Ultra-High Resolution Images. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).2019: 8924-8933Google Scholar, 21Lin C.Y. Chiu Y.C. Ng H.F. Shih T.K. Lin K.H. Global-and-local context network for semantic segmentation of street view images.Sensors (Basel). 2020; 20: 2907Crossref Scopus (11) Google Scholar, 22Wang S. Ye Z. Wang Y. GLNet for Target Detection in Millimeter Wave Images. Proceedings of the 3rd International Conference on Multimedia and Image Processing - ICMIP 2018.2018: 12-16Google Scholar, 23Wu T. Lei Z. Lin B. Li C. Qu Y. Xie Y. Patch proposal network for fast semantic segmentation of high-resolution images.Proc AAAI Conf Artif Intelligence. 2020; 34: 12402-12409Crossref Google Scholar, 24Zhang S. Song L. Gao C. Sang N. GLNet: global local network for weakly supervised action localization.IEEE Trans Multimedia. 2020; 22: 2610-2622Crossref Scopus (4) Google Scholar This architecture is referred to as glpathnet. Briefly, glpathnet comprises three arms: i) local branch (Figure 2A), ii) global branch (Figure 2A), and iii) ensemble branch (Figure 2A). The local branch receives cropped filtered patches from the original images as the input to a Feature Pyramid Network (FPN) model.25Lin T.-Y. Dollár P. Girshick R.B. He K. Hariharan B. Belongie S.J. Feature Pyramid Networks for Object Detection: Computer Vision and Pattern Recognition. IEEE Computer Society, Honolulu, HI2017: 936-944Google Scholar The FPN uses an efficient architecture to leverage multiple feature maps at low and high resolutions to detect objects at different scales. Cropped image patches (Np × Np pixels) were automatically extracted from the original WSIs and labeled as tissue or background using the following criterion. Image patches containing tissue within at least 50% or more pixels were labeled as tissue and otherwise labeled as background. The local branch containing the image patches labeled as tissue was fed into the FPN model. The global branch containing downsampled low-resolution versions (Ng × Ng pixels) of the original WSIs served as inputs to another FPN model. To enable local and global feature interaction, the feature maps from all layers of either branches were shared with the others (Figure 2B). The global branch cropped its feature maps at the same spatial location as the current local patch. To interact with the local branch, glpathnet upsamples the cropped feature maps to the same size of the feature maps from the local branch in the layer with the same depth. Subsequently, glpathnet concatenates the local feature maps and cropped global feature maps, which are fed into the next layer in the local branch. In a symmetrical manner, the local branch downsamples its feature maps to the same relative spatial ratio as the patches cropped from the original input image. On the basis of the location of the cropped patches, the downsampled local feature maps are merged together into feature maps of the same size of the global branch feature in the same layer. Feature maps with all zeros were used for the patches labeled as background. The global feature maps and merged local feature maps were concatenated and fed into the next layer in the global branch. The ensemble branch in glpathnet contains a convolutional layer, followed by a fully connected layer. It takes the concatenated feature maps from the last layer of the local branch and the same ones from the global branch. The output of the ensemble branch is a patch-level IFTA grade, and the final IFTA grade was determined as the most common patch-level IFTA grade. Cross-entropy loss was used to train glpathnet on the OSUWMC data using a pretrained DL architecture (ResNet50),26He K. Zhang X. Ren S. Sun J. Deep Residual Learning for Image Recognition. 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).2016: 770-778Google Scholar as part of the convolutional network of the FPN model. To maximize efficiency, both Np and Ng were set to 508 pixels. Adam optimizer (β1 = 0.9; β2 = 0.999) was used to optimize model training with a batch size of 6. We assigned the initial learning rate to 2 × 10−5 for the local branch and 1 × 10−4 for the global branch. Glpathnet was implemented using PyTorch, and model training was performed on a graphical processing unit (GPU) workstation containing GeForce RTX 2080 Ti graphics cards (NVIDIA, Santa Clara, CA) with an 11-Gb GDDR6 memory. Model training took <16 hours to reach convergence. Prediction of IFTA grade on a new WSI that was not used for model training took approximately 30 seconds. For comparison, an IFTA classification model was constructed based on traditional machine learning that used derived features from OSUWMC WSI data. Weighted neighbor distance was used that included compound hierarchy of algorithms representing morphology,27Orlov N. Shamir L. Macura T. Johnston J. Eckley D.M. Goldberg I.G. WND-CHARM: multi-purpose image classification using compound image transforms.Pattern Recognit Lett. 2008; 29: 1684-1693Crossref PubMed Scopus (201) Google Scholar, 28Shamir L. Orlov N. Eckley D.M. Macura T. Johnston J. Goldberg I.G. Wndchrm - an open source utility for biological image analysis.Source Code Biol Med. 2008; 3: 13Crossref PubMed Scopus (165) Google Scholar, 29Shamir L. Delaney J.D. Orlov N. Eckley D.M. Goldberg I.G. Pattern recognition software and techniques for biological image analysis.PLoS Comput Biol. 2010; 6: e1000974Crossref PubMed Scopus (117) Google Scholar which is a multipurpose image classifier that can extract approximately 3000 generic image descriptors, including polynomial decompositions, high-contrast features, pixel statistics, and textures (Supplemental Table S2). These features were directly derived from the raw WSI, transforms of the WSI, and compound transforms of the WSI (transforms of transforms). Using these features as inputs, a four-label classifier was constructed to predict the final IFTA grade. The model was trained on the OSUWMC data set, and the KPMP data set was used for testing. The final IFTA grade (reference estimate) was determined by taking the majority vote on the IFTA grading among all the five nephropathologists. The agreement between the nephropathologists was computed using κ scores between each pathologist grade and the reference estimate. The percentage agreement between the pathologists and between pathologists and the reference estimate was also computed. For the DL model trained on the OSUWMC data set, a fivefold cross validation was performed, and the average model accuracy, sensitivity/recall, specificity, precision, and κ scores were reported. Sensitivity/recall measures the proportion of true positives that are correctly identified, specificity measures the proportion of true negatives that are correctly identified, and precision is a fraction of true positives over the total number of positive calls. Computer scripts and manuals are made available on GitHub (https://github.com/vkola-lab/ajpa2021, last accessed May 1, 2021). Data from the OSUWMC can be obtained on request and subject to institutional approval. Data from KPMP can be freely downloaded (https://atlas.kpmp.org/repository, last accessed May 1, 2021). Informed consent was not required as all obtained data were de-identified. There was good agreement on the IFTA grading between the nephropathologists, where pairwise agreements ranged from 0.48 to 0.63 (Figure 3A). Interpathologist ratings assessed using pairwise κ showed moderate agreement, ranging from 0.31 to 0.50 (Figure 3B). There was good agreement when each pathologist grading was compared with the reference IFTA grade (κ = 0.622 ± 0.071). Note that this agreement must be interpreted by considering the evidence that the reference IFTA grade was also derived from the pathologists' grades. The DL model (glpathnet) accurately predicted the IFTA grade on the OSUWMC data (accuracy = 71.8% ± 5.3%), based on fivefold cross validation (Figure 4). The patch-level model predictions also consistently predicted IFTA grade, as indicated by the class-level receiver operating characteristic (ROC) curves (Figure 4, A–D). For the minimal IFTA label, the patch-level cross-validated model resulted in area under ROC curve of 0.65 ± 0.04. For the mild IFTA label, the area under ROC curve was 0.67 ± 0.04; for the moderate IFTA label, the area under ROC curve was 0.68 ± 0.09; and for the severe IFTA label, the area under ROC curve was 0.76 ± 0.06. For each class label, the cross-validated model performance on the WSIs was evaluated by computing mean precision, mean sensitivity, and mean specificity along with their respective standard deviations Figure 4E). For the minimal IFTA label, the precision was 0.82 ± 0.17, the sensitivity was 0.77 ± 0.08, and the specificity was 0.93 ± 0.07. For the mild IFTA label, the precision was 0.71 ± 0.15, the sensitivity was 0.68 ± 0.10, and the specificity was 0.91 ± 0.05. For the moderate IFTA label, the precision was 0.82 ± 0.10, the sensitivity was 0.73 ± 0.20, and the specificity was 0.93 ± 0.05. Finally, for the severe IFTA label, the precision was 0.65 ± 0.06, the sensitivity was 0.72 ± 0.16, and the specificity was 0.88 ± 0.04. Because of the nature by which specificity was computed for the model (ie, minimal versus not minimal, mild versus not mild, moderate versus not moderate, and severe versus not severe), the values were generally higher than precision and sensitivity for all cases. Fivefold cross validation indicated good agreement between the true and predicted IFTA grades on the OSUWMC data (κ = 0.62 ± 0.07) (Supplemental Figure S3). Class activation mapping (CAM) was performed on the WSIs to explore the regions that are highly associated with the output class label (Figure 5 and Supplemental Figure S4). Two different strategies were used to generate CAMs. The first method generated CAMs at the WSI (or global) level without utilizing the local features, whereas the second method generated CAMs that synthesized features at the local and global level. Although both these strategies generated CAMs that highlighted image subregions, CAMs based on the model that combined local and global representations showed higher qualitative association with the output label. Patch-level probabilities with high-degree association with the IFTA grade were generated (Figure 6). Each image patch and its set of probability values were reviewed by the nephropathologist (C.A.C.), who confirmed that patch-level patterns were consistent with model p

Referência(s)