Automated BI-RADS Density Classification with Convolutional Neural Networks Demonstrates Strong Agreement with Human Radiologist Consensus

Congress:

ECR 2019

Poster Number:

C-3512

Type:

Scientific Exhibit

Keywords:

Artificial Intelligence, Breast, Mammography, Computer Applications-General, Cancer

Authors:

J. L. Liu¹, R. L. Mimish², Z. Kellow³, M. Thériault³, J. W. Luo¹, S. BHATNAGAR³, B. Gallix¹, C. Reinhold¹, J. J. R. Chong¹; ¹Montreal, QC/CA, ²Jeddah/SA, ³Montreal/CA

DOI:

10.26044/ecr2019/C-3512

DOI-Link:

https://dx.doi.org/10.26044/ecr2019/C-3512

Conclusion

Limitations:

We trained the CNN on radiology reports, which were prone to the inherent subjectivity of human readers. As well, possible biases were introduced due to the large time period under study and changing BI-RADS labels and clinical standards over the dataset time period. Predictions made by the CNN were possibly better calibrated to the original interpreting radiologists, who, despite following the same density classification guidelines, could have had different practices than our three radiologist readers. Initial reports were evaluated by different readers, and could have been influenced by initial draft density proposals by trainees, which may be readily accepted unless markedly discrepant. Interestingly, we note that the overall accuracy of the CNN relative to the original report labels was >90%, suggesting that a component of this miscalibration may be recoverable and could be corrected with more consensus examples.

Future Directions:

The overall applicability of a CNN based approach for objective assessment of breast density is extremely high, with a strong suitability of low spatial-resolution breast density assessment to downsampled CNN based methods. Our results are in accordance with the results of Ciritsis et al. and Lehman et al. We found better agreement between the CNN and radiologist consensus than Sartor et al. Our own performance success of the network at classification relative to inter-rater human reviewers suggests that the minor losses in performance of our model could be due to training model calibration issues rather than any fundamental issue with the network methodology itself. Future investigations will aim towards investigating this possibility and discovering potential methods to quickly mitigate or re-calibrate consensus performance within the level of variability between human expert reviewers.

Conclusion:

In conclusion, a CNN transfer learning approach can automatically determine breast density, with very good agreement to a majority consensus breast density rating by radiologists.