Automated BI-RADS Density Classification with Convolutional Neural Networks Demonstrates Strong Agreement with Human Radiologist Consensus

Congress:

ECR 2019

Poster Number:

C-3512

Type:

Scientific Exhibit

Keywords:

Artificial Intelligence, Breast, Mammography, Computer Applications-General, Cancer

Authors:

J. L. Liu¹, R. L. Mimish², Z. Kellow³, M. Thériault³, J. W. Luo¹, S. BHATNAGAR³, B. Gallix¹, C. Reinhold¹, J. J. R. Chong¹; ¹Montreal, QC/CA, ²Jeddah/SA, ³Montreal/CA

DOI:

10.26044/ecr2019/C-3512

DOI-Link:

https://dx.doi.org/10.26044/ecr2019/C-3512

Fig. 1: Training and Evaluation Experiment Design for Mammographic Density Evaluation

Fig. 2: Example of neural network inference and class-activation map ("heat map"),...

Fig. 3: Inter-rater reliability agreement analysis of the DCNN versus Radiologists.

Fig. 4: Kendall's coefficient of concordance weights analysis of the DCNN versus 3...

Aims and objectives

Breast cancer is the most common cancer affecting women worldwide, and mammograms are routinely performed in screening and diagnostic examinations [1]. In comparison to fatty breasts, very dense breasts increase breast cancer risk by a factor of 4.64, as denser breasts contain more epithelial cells and epithelial proliferation [2,3]. Higher breast density levels also affect mammographic sensitivity [4]. Recognizing the importance of breast density, thirty-one US states implemented laws requiring radiologists to communicate breast density to patients following mammography [5]. Breast density is commonly reported...

Methods and materials

Study Population: Keyword retrieval of relevant mammograms was performed on PACS. Exclusion criteria was applied on mammograms with pacemakers, breast implants, non-standard views such as compressions and magnifications, and examinations with technical quality issues. The dataset included 69,202 examinations from 4,851 patients, with an average of 3.69 images per study. Of this dataset, 200 four-view studies, totaling 800 images, were reserved to evaluate the performance of the CNN and radiologist reviewers (Fig. 1). The remaining 68,402 mammograms served as the network training images. Image Pre-Processing...

Results

Network Performance: The CNN yielded an accuracy performance of 85.0%, with individual density test AUC’s ranging from 0.935-0.998 (Fig. 2). On the held-out set of this specific investigation, the final trained network obtained a 90.8% accuracy when evaluated relative to the original clinical report label. Inter-Rater Human Reviewer Agreement: Individual human reviewer agreement with the average consensus was overall excellent, with individuals reporting Cohen’s Kappa coefficient from 0.91-0.94 (Fig. 3). While some of this is attributable to the derivation of consensus from the individual rating...

Conclusion

Limitations: We trained the CNN on radiology reports, which were prone to the inherent subjectivity of human readers. As well, possible biases were introduced due to the large time period under study and changing BI-RADS labels and clinical standards over the dataset time period. Predictions made by the CNN were possibly better calibrated to the original interpreting radiologists, who, despite following the same density classification guidelines, could have had different practices than our three radiologist readers. Initial reports were evaluated by different readers, and could...

Personal information

References

[1] Global Burden of Disease Cancer Collaboration. The Global Burden of Cancer 2013. JAMA Oncol. 2015 Jul;1(4):505-27. doi: 10.1001/jamaoncol.2015.0735. [2] Turashvili G, McKinney S, Martin L, et al. Columnar cell lesions, mammographic density and breast cancer risk. Breast Cancer Res Treat 2009 Jun;115(3):561e71. [3] McCormack VA, dos Santos Silva I. Breast density and parenchymal patterns as markers of breast cancer risk: a meta-analysis. Cancer Epidemiol Biomarkers Prev. 2006 Jun;15(6):1159-69. [4] Wanders JOP, Holland K, Veldhuis WB, et al (2017). Volumetric breast density affects performance of...