
The ability to adapt to a changing environment is integral in the field of medicine, both for humans and algorithms. The best of both struggle from time to time.
This was the case recently for an award-winning deep learning model touted for its excellent performance in pediatric bone age prediction. When experts subjected the model, which previously won the 2017 RSNA Pediatric Bone Age Machine Learning Challenge, to a “stress test” of sorts, its resultant performance caused researchers to question how it might fare in the real world.
The research was detailed recently in a paper published in Radiology: Artificial Intelligence.
“Despite radiologist-level performance for medical imaging diagnosis, DL models’ robustness to both extreme and clinically encountered image variation has not been thoroughly evaluated,” corresponding author Paul H. Yi, from the Department of Diagnostic Radiology and Nuclear Medicine at the University of Maryland School of Medicine, and co-authors explained. “In clinical practice, image acquisition is variable, and there is no standard of orientation or postprocessing, which could be an overlooked source of error for DL models in radiology.”
Experts initially tested the model on two different datasets: the RSNA validation set, which contains more than 1,400 pediatric hand radiographs, and the Digital Hand Atlas (DHA), which boasts more than 1,200 pediatric hand x-rays. As expected, the model performed well on both, “indicating good model generalization to external data,” the authors wrote.