Automated BI-RADS Density Classification with Convolutional Neural Networks Demonstrates Strong Agreement with Human Radiologist Consensus

Congress:

ECR 2019

Poster Number:

C-3512

Type:

Scientific Exhibit

Keywords:

Artificial Intelligence, Breast, Mammography, Computer Applications-General, Cancer

Authors:

J. L. Liu¹, R. L. Mimish², Z. Kellow³, M. Thériault³, J. W. Luo¹, S. BHATNAGAR³, B. Gallix¹, C. Reinhold¹, J. J. R. Chong¹; ¹Montreal, QC/CA, ²Jeddah/SA, ³Montreal/CA

DOI:

10.26044/ecr2019/C-3512

DOI-Link:

https://dx.doi.org/10.26044/ecr2019/C-3512

Aims and objectives

Breast cancer is the most common cancer affecting women worldwide, and mammograms are routinely performed in screening and diagnostic examinations [1]. In comparison to fatty breasts, very dense breasts increase breast cancer risk by a factor of 4.64, as denser breasts contain more epithelial cells and epithelial proliferation [2,3]. Higher breast density levels also affect mammographic sensitivity [4]. Recognizing the importance of breast density, thirty-one US states implemented laws requiring radiologists to communicate breast density to patients following mammography [5].

Breast density is commonly reported using the American College of Radiology Breast Imaging Reporting and Data System (ACR BI-RADS). BI-RADS separates area-based percent of breast density into four categories, from almost entirely fatty breasts, ACR Type A, to extremely dense breasts, ACR Type D. This classification, which relies on visual interpretation, often results in a high level of intra and inter-rater variability. Two studies have shown that upon subsequent evaluation by the same radiologist, 23-29% of mammograms were classified differently. Another study found that 32% of mammograms were classified differently by different radiologists [6].

Previous research has demonstrated significant correlation between automated algorithms and radiologists. Sartor et al. observed Cohen’s kappa (κ) values of 0.77 between five radiologists, and 0.55 between a volumetric software and radiologists [7]. Lehman et al. developed a deep learning model with an agreement of 0.78 with consensus of five radiologists [8]. Ciritsis et al. found a kappa of 0.91 between a network and 2 radiologists on mediolateral-oblique views, and 0.82 on craniocaudal views [9].

The high discrepancy between mammogram classifications by radiologists indicates a need for effective standardization to improve patient care. Developing algorithmic or quantitative density rating techniques may increase the consistency of breast density assessment. We aim to compare the agreement of convolutional neural networks (CNN) in stratifying breast density in comparison to multiple radiologist expert consensus.