Quantifying the rate of positive findings in CT pulmonary embolism examinations

Congress:

ECR 2018

Poster Number:

C-0108

Type:

Scientific Exhibit

Keywords:

Computer applications, Management, Lung, Neural networks, CT, RIS, Computer Applications-General, Cost-effectiveness, Statistics, Embolism / Thrombosis, Quality assurance, Economics

Authors:

E. Sjöblom¹, C. Lundström¹, M. Andersson¹, N. Carius², J. Taghia³; ¹Linköping/SE, ²Ljungsbro/SE, ³Uppsala/SE

DOI:

10.1594/ecr2018/C-0108

DOI-Link:

https://dx.doi.org/10.1594/ecr2018/C-0108

Fig. 1: Screenshot of system used to perform categorization of the radiology reports.

Fig. 3: A noisy channel that perturbs the input U according to the probabilities given...

Fig. 9: The posterior estimated from different number of samples. The dashed line...

Aims and objectives

In the management of radiology operations, it is important to understand the effectiveness of diagnostic pathways. One source of data to underpin managerial decision-making is the radiology report, but large-scale mining of such data is a challenge. The challenge comes from extracting actionable data from the free-text format that most radiology reports consist of [3]. We here study at the problem of quantifying the rate of positive findings in CT pulmonary embolism examinations. The number of examinations with positive findings versus the total number of...

Methods and materials

Report data: Reports for CT pulmonary embolism examinations from a university hospital was collected (n=22305). The reports were categorized into five categories by two medical experts. The experts only had access to the clinical history and the report text, no images (Fig. 1). The categories (decided on by the medical experts) were: Embolism (n=3512) No embolism (n=16890) Previously known embolism/No increase (n=160) Uncertain (n=289) Not applicable (n=1454, these include reports for examinations that had been incorrectly marked as CT pulmonary embolism examinations) The dataset was...

Results

Report classification: The accuracy of the report classifier was evaluated on the test set and reached an accuracy of 98% and an F1-score of 0.93. Fig. 7 showexamples of reports classified correctly and incorrectly. The classifier mainly struggles with reports where there is an addendum contradicting the original statement and reports that are very long (often due to reports covering multiple reasons for examination). Confusion matrix: To estimate the transition-probabilities for the classifier we compute the confusion matrix on the development set (keeping the test...

Conclusion

The proposed method can accurately estimate the rate of positive findings in a set of reports. The method takes into account that using any type of machine learning classifier to predict if a report includes a positive finding will introduce a source of error. It tries to mitigate this source of error by utilizing the confusion matrix of the classifier. The model can be adapted to different scenarios by injecting additional information into the shape of the prior on the observed rate of positive findings....

Personal information

E. Sjöblom, Msc CSE Sectra AB Teknikringen20 SE-58330Linköping SWEDEN Phone: +4613235200 Fax: +4613212185 Email: [email protected]

References

[1] Lovins, J. B. (1968). Development of a stemming algorithm. Mech. Translat. & Comp. Linguistics, 11(1-2), 22-31. [2] Pedregosa et al. (2011), Scikit-learn: Machine Learning in Python, JMLR 12, pp. 2825-2830. [3] Pons et al.(2016) Natural Language Processing in Radiology: A Systematic Review. Radiology. [4] C. E. SHANNON (1948). A Mathematical Theory of Communication. The Bell System Technical Journal, Vol. 27, pp. 379–423, 623–656 [5] Downey, A. B. & Loukides, M. & Spencer, A. (eds.) (2013). Think Bayes. Sebastopol, California: O'Reilly Media.