Utilizing Deep Learning to Standardize Annotation and Labeling of Large Abdominal CT Multi-Centre Datasets

Congress:

ECR 2019

Poster Number:

C-3057

Type:

Scientific Exhibit

Keywords:

Computer Applications-General, CT, Liver, Artificial Intelligence, Abdomen, Image verification

Authors:

R. Remtulla¹, S. L. Mihalcioiu², J. W. Luo², B. Gallix², J. J. R. Chong²; ¹Montreal, Quebec/CA, ²Montreal, QC/CA

DOI:

10.26044/ecr2019/C-3057

DOI-Link:

https://dx.doi.org/10.26044/ecr2019/C-3057

Conclusion

Study Limitations

Ideal classification machine learning experiments have classes of equal sample size. With unequal sample sizes, models may be able to correctly classify images based on overall population probability irrespective of image contents. It is possible that our model inherently favours classifying scans as non-axial or axial non-liver CT slices, over delayed or non enhanced axial liver CT slices, although this was not seen to be a major contaminant in the test set classification characteristics. In addition, we anticipate that aggregated over multiple slice inferences in a single series, although the accuracy on any given slice image may change, the majority class vote in a given series would likely remain stable.

Future Directions

Further experimentation will explore the impact of class imbalance and the effect on aggregate inference upon a prediction for a full series volume. Further validation of such a system would additionally benefit from repeat validation on a secondary external dataset. In addition, given the cost to average inference time, we believe that practical efficient implementation of such a system may employ series sub-sampling strategies that will minimize inference time while preserving overall series inclusion/exclusion accuracy.

System Benefits

A fully developed model has significant application preparatory stages of traditional machine learning experiments. Primarily a model could be used to visually presort images into relevant classes directly from PACS and would aid in preparing the necessary accurate large databases for training, prior to further human review and labeling. Furthermore, a similar model could be developed to identify and remove low quality images or images that failed to satisfy an image quality criteria for inclusion, for instance inadequate bolus timing, which may be misleading for a machine learning model to train on hampering overall model performance. We anticipate that visual presorting strategies such as what we have proposed here, in conjunction with already commonly employed rules-based or metadata based strategies will prove critical in curating the next generation of AI training cohorts.

Conclusion

A deep convoluted neural network transfer learning approach is able to quickly and accurately standardize views and phases of contrast wholly independent of image labeling assumptions, permitting the automated preparation of large datasets for machine learning