Stress testing a deep learning algorithm for normal/abnormal classification of chest x-rays on a spectrum-biased abnormal-weighted dataset

Congress:

ECR 2019

Poster Number:

C-2743

Type:

Scientific Exhibit

Keywords:

Artificial Intelligence, Lung, Digital radiography, Computer Applications-General, Infection

Authors:

V. Venugopal¹, R. L. González², L. Marti-Bonmati², A. Alberich-Bayarri², M. Barnwal¹, V. Mahajan¹; ¹New Delhi/IN, ²Valencia/ES

DOI:

10.26044/ecr2019/C-2743

DOI-Link:

https://dx.doi.org/10.26044/ecr2019/C-2743

Fig. 1: A visual diagram of the model approach.

Fig. 2: Architecture of the proposed CNN for pathology-specific binary classification.

Fig. 3: Heatmaps for different pathologies. The upper row shows the input images to...

Aims and objectives

To stress test the performance of a deep learning algorithm on a dataset with spectrum bias against normalcy in chestx-ray normal vs. abnormal classifier

Methods and materials

A Deep Learning algorithm consisting of an ensemble of 14 Convolutional Neural Networks (CNN) and a weighting Fully Connected Network (Fig. 1) were trained with more than 112,000 Chest X Ray studiesidentified with one or more labels from14 different thoracic pathologies defined. The 14 CNN were based in the VGG-19 (Fig. 2) architecture and transfer learning with ImageNet dataset was used to accelerate convergence and improve the performance of the algorithm. The output of the algorithm was the probability of an input image of being...

Results

The algorithm correctly classified 237 (78.74%) CXRs with a sensitivity of 83.76% (95% CI - 77.85% to 88.62%) and specificity of 69.23% (95% CI - 59.42% to 77.91%). There were equal number of false positives and false negative cases- 32 (13.5%). For screening applications sensitivity is crucial due to overlooking a patholology may cause severe consequences for patients, therefore isvery convenientand positivethat the system's performance under a stress testing prioritize sensitivity over specificity.

Conclusion

As compared to the validation results, there is an increment in the performance of the deep learning algorithm on the stress test on biased datasets with more abnormal scans than normal scans.

Personal information

References

1- Wang X., Peng Y., Lu L., et al. ChestX-ray8: Hospital-scale Chest X-ray Database and Benchmarks on Weakly-Supervised Classification and Localization of Common Thorax Diseases. ArXiv:1705.02315. 2-Buda M., Maki A., Mazurowski M. A systematic study of the class imbalance problem inconvolutional neural networks. ArXiv:1710.05381. 3-Simonyan K., Zisserman. A Very Deep Convolutional Neural Networks for Large-ScaleImage Recognition. ArXiv:1409.1556. 4-Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., Fei-Fei, L., I. Imagenet: A large-scalehierarchical image database. IEEE Conference on Computer Vision and Pattern Recognition, 2009. Vols....