Automated BI-RADS Density Classification with Convolutional Neural Networks Demonstrates Strong Agreement with Human Radiologist Consensus

Congress:

ECR 2019

Poster Number:

C-3512

Type:

Scientific Exhibit

Keywords:

Artificial Intelligence, Breast, Mammography, Computer Applications-General, Cancer

Authors:

J. L. Liu¹, R. L. Mimish², Z. Kellow³, M. Thériault³, J. W. Luo¹, S. BHATNAGAR³, B. Gallix¹, C. Reinhold¹, J. J. R. Chong¹; ¹Montreal, QC/CA, ²Jeddah/SA, ³Montreal/CA

DOI:

10.26044/ecr2019/C-3512

DOI-Link:

https://dx.doi.org/10.26044/ecr2019/C-3512

Fig. 2: Example of neural network inference and class-activation map ("heat map"),...

Fig. 3: Inter-rater reliability agreement analysis of the DCNN versus Radiologists.

Fig. 4: Kendall's coefficient of concordance weights analysis of the DCNN versus 3...

Results

Network Performance:

The CNN yielded an accuracy performance of 85.0%, with individual density test AUC’s ranging from 0.935-0.998 (Fig. 2). On the held-out set of this specific investigation, the final trained network obtained a 90.8% accuracy when evaluated relative to the original clinical report label.

Inter-Rater Human Reviewer Agreement:

Individual human reviewer agreement with the average consensus was overall excellent, with individuals reporting Cohen’s Kappa coefficient from 0.91-0.94 (Fig. 3). While some of this is attributable to the derivation of consensus from the individual rating themselves, no major discrepancies or outliers were noted amongst the reviewers.

AI vs Human Reviewer Agreement:

IRR analysis with a linearly weighted Cohen’s kappa demonstrated very good agreement (κ=0.87) between the CNN and the mammographers consensus, however with lesser agreement than the lowest scoring radiologist’s agreement (Fig. 3: Radiologist 3; κ=0.91). Kendall’s Coefficient of Concordance weight yielded a weight of 0.945 between the CNN and consensus readings (Fig. 4).