Inter-rater variability in the assessment of positioning quality in chest x-ray images

Congress:

ECR 2020

Poster Number:

C-05601

Type:

Scientific Exhibit

Keywords:

Multicentre study, Observational, Not applicable, Quality assurance, Comparative studies, Digital radiography, Thorax, Professional issues, Artificial Intelligence, Chest

Authors:

M. Englmaier¹, D. Sasse¹, D. Pfeiffer¹, M. Kotnik², L. Lin², H. J. Lamb², J. Conradsen³, J. Fløtten³, N. Wieberneit⁴; ¹Munich/DE, ²Leiden/NL, ³Herning/DK, ⁴Hamburg/DE

DOI:

10.26044/ecr2020/C-05601

DOI-Link:

https://dx.doi.org/10.26044/ecr2020/C-05601

Fig. 5: Distribution of image ratings for rotation, 2 raters Hospital 3

Fig. 6: Confusion matrix comparing rating of 2 readers, Hospital 1, for completeness of...

Fig. 7: Distribution of image rating for upper edge of image, 2 readers Hospital 1

Results

The assessment by the readers across all hospitals was generally in fair to moderate agreement (Fleiss’ kappa 0.32 to 0.5), as shown in table 1. The agreement between the two readers from each hospital was in the same order of magnitude, with some exceptions.

Table 1: Kappa values across hospitals (overall) and readers at
respective hospital

Image quality aspect	Overall	Hospital 1	Hospital 2	Hospital 3
FOV North	0.46	0.57	0.42	0.21
FOV East	0.5	0.48	0.27	0.49
FOV West	0.43	0.36	0.17	0.47
FOV South	0.49	0.57	0.51	0.46
Rotation	0.32	0.08	0.42	0.37
Inhalation	0.43	0.15	0.34	0.6

Interpretation: 0.01 – 0.20 slight agreement, 0.21 – 0.40 fair agreement,
0.41 - 0.60 moderate agreement, 0.61 – 0.80 substantial agreement,
0.81 – 1.00 almost perfect agreement.

The least agreement in the assessment was observed for patient rotation with a kappa value of 0.32 across sites and values as low as 0.08 within hospitals. This can partly be attributed to the fact that readers not only assessed whether the clavicles where symmetric to the center line of the spine, but also rated the severity of the asymmetry as inadequate or unacceptable. Here, the rating varied considerably, as shown in Fig 5. When evaluating the data only with respect to rating as symmetrical or asymmetrical, the agreement was significantly higher, reaching substantial agreement in some instances (Kappas 0.24-0.73). Similar observations were made for inhalation, as illustrated in Fig. 6. Kappa values were increased to 0.44-0.69, when complete and incomplete assessment were analyzed only, without differentiating between inadequate and unacceptable.

Significant differences were also seen in the field of view, indicated by the distance between lung fields and image edge, especially that perceived as too wide. An example of the ratings in given in Fig. 7.
This would imply a strongly differing threshold between good and unacceptable image quality, if implemented in an automated image quality analysis tool [8]. As this parameter also impacts patient dose, it should be given careful consideration.