What Can We Learn from the RSNA Pediatric Bone Age Machine Learning Challenge?
2018; Radiological Society of North America; Volume: 290; Issue: 2 Linguagem: Inglês
10.1148/radiol.2018182657
ISSN1527-1315
Autores Tópico(s)Viral Infections and Vectors
ResumoHomeRadiologyVol. 290, No. 2 PreviousNext Reviews and CommentaryFree AccessEditorialWhat Can We Learn from the RSNA Pediatric Bone Age Machine Learning Challenge?Eliot L. Siegel Eliot L. Siegel Author AffiliationsFrom the Departments of Diagnostic Radiology and Nuclear Medicine, University of Maryland School of Medicine, 22 S Greene St, Baltimore, MD 21201; and VA Maryland Healthcare System, Baltimore, Md.Address correspondence to the author (e-mail: [email protected]).Eliot L. Siegel Published Online:Dec 4 2018https://doi.org/10.1148/radiol.2018182657MoreSectionsPDF ToolsImage ViewerAdd to favoritesCiteTrack CitationsPermissionsReprints ShareShare onFacebookTwitterLinked In See also the article by Halabi et al in this issue.IntroductionFranklin D. Roosevelt said "Competition has been shown to be useful up to a certain point and no further, but cooperation, which is the thing we must strive for today, begins where competition leaves off."Competition is an extraordinarily powerful driving force for innovation, improvement, and success. As far back as 776 BC, the Olympic Games have inspired athletes to ever greater levels of performance. Evolutionary biologists believe that competitiveness has coevolved in humans along with the struggle for survival (1).The machine learning community in particular has taken advantage of the power of competition by issuing challenges, such as the annual ImageNet Large Scale Visual Recognition Challenge, which uses the more than 14 million hand-annotated color photographs from the ImageNet database. Kaggle, a company recently acquired by Google, has sponsored hundreds of machine learning competitions involving thousands of participants who frequently publish their winning strategies in academic journals and are awarded prize money in return for "a worldwide, perpetual, irrevocable and royalty-free license."The Radiological Society of North America (RSNA) machine learning committee launched the RSNA Pediatric Bone Age Machine Learning Challenge and publicly recognized the teams that created the best-performing algorithms at the 2017 RSNA Annual Meeting. In this issue of Radiology, Halabi et al (2) describe the competition and its aims to demonstrate an application of machine learning in medical imaging, to promote "collaboration to catalyze artificial intelligence model creation," and to "identify innovators in medical imaging." The RSNA challenge used medical images, clinical reports, and the evaluation of four pediatric radiologists from data already collected in a study by Larson et al (3) that was designed to demonstrate the application of deep learning to predict the bone age of a child by using a radiograph of the hand.An annotated training data set consisting of 12 611 pediatric hand radiographs was made available to competitors. The annotation included an expert consensus estimate of bone age and sex. Most contestants used a specific type of machine learning known as deep learning, which uses a convolutional neural network. The training data set was supplemented by an annotated validation data set of 1425 images that was designated to allow competitors to assess the accuracy of the algorithms that they developed from the training images. Finally, a test data set of 200 images without annotation was used to compare the accuracy of the algorithms developed by each team. There were 48 unique teams for a total of 105 contestants.Ground truth in the study (3) was established with inputs from four radiologists, including a second interpretation by one of the radiologists 1 year later, as well as the original report. All of these interpretations used the Greulich and Pyle atlas matching method (4) and were combined by using a weighting factor that took into account the estimated relative accuracy of the radiologists, as judged by the other interpretations.The individual performance of pediatric radiologists was judged by determining their mean absolute difference (MAD) from a weighted consensus score, with MADs ranging from 5 to 7 months for the four readers. In comparison, the 10 best machine learning teams achieved MADs with a narrow range of 4.265–4.907 months, with the three best entries separated by a remarkably thin margin of 3.5 days (2). All of the top 10 entries outperformed the machine learning model described in the original study (3) from which the data sets were derived and which had an MAD of slightly more than 6 months. When the results from the second- and fourth-place teams were combined, MAD improved to 4.00.There were multiple common themes in the machine learning approaches used by the top five challenge winners, and all except the fourth-place team used deep learning. One theme was the use of data augmentation, in which data sets are increased in size by adding variants of images to the data set. These include flipping the images horizontally, vertically, or obliquely; applying image filters; and adding noise. Another theme was preprocessing, which could include a task to break up the hand images into subcomponents, such as fingers, metacarpals, and joint spaces. Finally, many teams created multiple algorithms that they teamed together to achieve a higher accuracy than could be achieved with any one algorithm alone. Multiple algorithms in combination in machine learning are termed ensembles.There are many additional lessons from the RSNA Machine Learning Challenge. An important point that is usually ignored by the media and often by authors of machine learning articles is the fact that the radiologists typically are given a subtly different task than the deep learning or artificial intelligence algorithm. The clinically oriented task for the radiologists who set the reference standard was to simply determine the best match for a given radiograph by using the atlas. This differs from the task for the machine learning algorithm, which was to guess the expected mean weighted score of the radiologists for a given radiograph. For example, if a radiologist knows that Alice the radiologist usually guesses high, he might change the prediction of the mean score in that direction. However, the radiologists are not asked to predict the MAD and are thus at a disadvantage when they are compared with the machine learning algorithms. Another challenge that has been well described in the literature (4) is the substantial difference in bone age estimation between the atlas matching method of Greulich and Pyle and other methods, such as the Tanner-Whitehouse method. This is partly due to the fact that Greulich and Pyle studied American children of high socioeconomic status in the 1940s, while Tanner and Whitehouse studied Scottish children of low socioeconomic status in the 1950s. It is probable that a database derived from an ethnically diverse population in 2019 of children whose disease likely manifests earlier in puberty would also differ greatly from those older less diverse databases. This raises the question of the optimal approach for a data set and validation for the development of a commercial version of the bone age software for use in the United States and other parts of the world.The extremely small margin between performance of the five best methods raises several interesting questions: Have we reached the limitations of the granularity of the atlas method itself for bone age or have we reached a plateau in deep learning in diagnostic imaging where incremental advances in the technique provide minimal improvement on performance? Or, as Somers asks in the MIT Technology Review, is artificial intelligence riding a one-trick pony (5)? Will methods emerge in the next 5–20 years that would be able to substantially improve on the performance from the data set used in the RSNA challenge? Should we aggressively encourage the development of novel approaches to create more effective and efficient algorithms from images? Will revolutionary breakthroughs in hardware and software, such as quantum computing, which holds the promise of speeding up computation by many orders of magnitude, have any effect on image analysis?In conclusion, the inaugural RSNA Machine Learning Challenge was successful in meeting its goals to demonstrate machine learning in medical imaging, catalyze model creation, and recognize innovators. The selection of pediatric bone age determination worked very well on multiple levels to achieve those aims and to demonstrate the potential to advance research in machine learning by sharing a common data set and goal. This task is particularly well suited to machine learning because of the relatively well-defined nature of the quantitative assessment of bone age and relative consistency and simplicity of the anatomy of the digital radiographs of the hand. Consequently, the use of machine learning to determine bone age of a pediatric hand radiograph has been the subject of hundreds of research papers over the past 20 years. It is also a relatively tedious, repetitive, and time-consuming job from a clinical perspective that makes it a good candidate for clinical implementation. Finally, and most importantly, it provides a compelling example of the research potential associated with the sharing of raw data in a published article with the research community to allow novel and innovative ideas and incremental improvements. Underscoring this was the improved performance achieved when the authors combined the second-place (deep learning) and fourth-place (conventional machine learning) teams that resulted in an accuracy that surpassed that of the first-place team. This sharing of raw data is commonly described in other specialty journals, such as the Journal of the Optical Society of America, which encourages authors to upload data sets to their portal as supplemental materials. This could serve as an excellent model for the diagnostic imaging community's journals and their authors and could provide a way to make the transition from competitions, such as those described by Halabi et al, to an enduring culture in which researchers facilitate innovation and creativity in their colleagues by sharing their work. As President Roosevelt aptly observed, cooperation begins where competition leaves off.Disclosures of Conflicts of Interest: E.L.S. disclosed no relevant relationships.References1. Nowak MA. Five rules for the evolution of cooperation. Science 2006;314(5805):1560–1563. Crossref, Medline, Google Scholar2. Halabi SS, Prevedello LM, Kalpathy-Cramer J, et al. The RSNA Pediatric Bone Age Machine Learning Challenge. Radiology 2019;290:498–503. Link, Google Scholar3. Larson DB, Chen MC, Lungren MP, Halabi SS, Stence NV, Langlotz CP. Performance of a deep-learning neural network model in assessing skeletal maturity on pediatric hand radiographs. Radiology 2018;287(1):313–322. Link, Google Scholar4. Bull RK, Edwards PD, Kemp PM, Fry S, Hughes IA. Bone age assessment: a large scale comparison of the Greulich and Pyle, and Tanner and Whitehouse (TW2) methods. Arch Dis Child 1999;81(2):172–173. Crossref, Medline, Google Scholar5. Somers J. Is AI riding a one-trick pony? MIT Technology Review. September 29, 2017. Google ScholarArticle HistoryReceived: Nov 19 2018Accepted: Nov 19 2018Published online: Dec 04 2018Published in print: Feb 2019 FiguresReferencesRelatedDetailsCited ByAn Automated TW3-RUS Bone Age Assessment Method with Ordinal Regression-Based Determination of Skeletal MaturityDongxuZhang, BowenLiu, YulinHuang, YangYan, ShaoweiLi, JinshuiHe, ShuyunZhang, JunZhang, NingshaoXia2023 | Journal of Digital Imaging, Vol. 36, No. 3Exploring the landscape of automatic cerebral microbleed detection: A comprehensive review of algorithms, current trends, and future challengesMariaFerlin, ZuzannaKlawikowska, MichałGrochowski, MałgorzataGrzywińska, EdytaSzurowska2023 | Expert Systems with Applications, Vol. 232Effect of AI-assisted software on inter- and intra-observer variability for the X-ray bone age assessment of preschool childrenKaiZhao, ShuaiMa, ZhaonanSun, XiangLiu, YingZhu, YufengXu, XiaoyingWang2022 | BMC Pediatrics, Vol. 22, No. 1Artificial Intelligence–Assisted Bone Age Assessment to Improve the Accuracy and Consistency of Physicians With Different Levels of ExperienceXiWang, BoZhou, PingGong, TingZhang, YanMo, JieTang, XinmiaoShi, JianhongWang, XinyuYuan, FengsenBai, LeiWang, QiXu, YuTian, QingHa, ChencuiHuang, YizhouYu, LinWang2022 | Frontiers in Pediatrics, Vol. 10The uncovered biases and errors in clinical determination of bone age by using deep learning modelsMeiBai, LiangxinGao, MinJi, JianbangGe, LingyunHuang, HaoChenQiao, JingXiao, XiaotianChen, BinYang, YingqiSun, MinjieZhang, WenjieZhang, FeihongLuo, HaoweiYang, HaibingMei, ZhongweiQiao2022 | European Radiology, Vol. 33, No. 5Review of Artificial Intelligence Training Tools and Courses for RadiologistsMichael L.Richardson, Scott J.Adams, AtulAgarwal, William F.Auffermann, Anup K.Bhattacharya, NikitaConsul, Joseph S.Fotos, Linda C.Kelahan, ChristineLin, Hao. S.Lo, Xuan V.Nguyen, Lonie R.Salkowski, Jessica M.Sin, Robert C.Thomas, ShafikWassef, IchiroIkuta2021 | Academic Radiology, Vol. 28, No. 9Self-Supervised Attention Mechanism for Pediatric Bone Age Assessment With Efficient Weak AnnotationChuanbinLiu, HongtaoXie, YongdongZhang2021 | IEEE Transactions on Medical Imaging, Vol. 40, No. 10Artificial Intelligence Solutions for Analysis of X-ray ImagesScott J.Adams, Robert D. E.Henderson, XinYi, PaulBabyn2021 | Canadian Association of Radiologists Journal, Vol. 72, No. 1Estimation of age in unidentified patients via chest radiography using convolutional neural network regressionCarl F.Sabottke, Marc A.Breaux, Bradley M.Spieler2020 | Emergency Radiology, Vol. 27, No. 5Artificial Intelligence in Medicine: Where Are We Now?SagarKulkarni, NuranSeneviratne, Mirza ShaheerBaig, Ameer Hamid AhmedKhan2020 | Academic Radiology, Vol. 27, No. 1Ensemble Learning with Multiclassifiers on Pediatric Hand Radiograph Segmentation for Bone Age AssessmentRuiLiu, YuanyuanJia, XiangqianHe, ZheLi, JinhuaCai, HaoLi, XiaoYang, Jyh-ChengChen2020 | International Journal of Biomedical Imaging, Vol. 2020Artificial intelligence system can achieve comparable results to experts for bone age assessment of Chinese children with abnormal growth and developmentFengdanWang, XiaoGu, ShiChen, YongliangLiu, QingShen, HuiPan, LeiShi, ZhengyuJin2020 | PeerJ, Vol. 82019 12th International Congress on Image and Signal Processing, BioMedical Engineering and Informatics (CISP-BMEI)WeiTang, GangWu, GangShen2019Accompanying This ArticleThe RSNA Pediatric Bone Age Machine Learning ChallengeNov 27 2018RadiologyRecommended Articles Performance of a Deep-Learning Neural Network Model in Assessing Skeletal Maturity on Pediatric Hand RadiographsRadiology2017Volume: 287Issue: 1pp. 313-322Generalizability and Bias in a Deep Learning Pediatric Bone Age Prediction Model Using Hand RadiographsRadiology2022Volume: 306Issue: 2Taking Matters into Your Own HandsRadiology: Artificial Intelligence2020Volume: 2Issue: 4Rethinking Greulich and Pyle: A Deep Learning Approach to Pediatric Bone Age Assessment Using Pediatric Trauma Hand RadiographsRadiology: Artificial Intelligence2020Volume: 2Issue: 4The RSNA Pediatric Bone Age Machine Learning ChallengeRadiology2018Volume: 290Issue: 2pp. 498-503See More RSNA Education Exhibits Evidence Based Radiology in the age of Artificial Intelligence: The PICO/DATO ModelDigital Posters2020Finger Injuries: Knowledge In The Palm Of Your HandDigital Posters2021Lesions of the Phalanges and Metacarpals of the Hand: A Resident PrimerDigital Posters2020 RSNA Case Collection High Pressure Injection InjuryRSNA Case Collection2021Mallet fingerRSNA Case Collection2021PyknodysostosisRSNA Case Collection2020 Vol. 290, No. 2 Metrics Altmetric Score PDF download
Referência(s)