Design Cycle
“It's just aggravating to have to move and shuffle all these windows… shuffle between the list and your [Brand Name] dictation software… [or] Google Chrome or Internet Explorer, to search for something on there. Everything's just opening on top of each other, which is aggravating.” - UX interview with Interventional Radiologist, USA
The design of the entire user experience of our AI tool has involved radiologists and other clinicians at every step, which has helped generate feedback to ensure that the software is usable in the intended work environment with minimal workflow disruption[11]. The design of our tool is iteratively refined through “rounds” of radiologist feedback involving 4-7 radiologists and shown in Figure 1. Such sessions involve manipulation of the prototype during a structured interview session, and focuses on clinical aspects of the design such as:
- The groupings and names of the 124 clinical findings
- Attitudes to the confidence bar
- Attitudes to the region of interest highlights
- The interaction of the widget with work software
This design cycle also emphasises interpretability of predictions in recognition of its growing importance. Drawing upon techniques suggested by a growing body of research[12], we explored attitudes to interpretable predictions for clinically important findings in three main ways:
- Provision of confidence bars
- Provision of localisation maps
- Provision of differentials
Confidence Bar
“The x-rays are sort of black and white, but the actual diagnosis is, you know, it's not black and white and that's one of the criticisms of AI using the ground truth in that you shouldn't rely on reports because you put the same, x-ray in front of three radiologists, there'll be different opinions.” - UX Interview with Radiologist, Australia
“I like that actually, because again, very few things in medicine are black and white… I really liked that there is a degree of uncertainty built into the system because it means that I can ultimately disagree with or agree with the AI and say, look, I'm not even sure that that thing is present. Neither is the AI.” - UX Interview with Emergency Doctor, Australia
Human radiologists are familiar with soft classifications such as “probable”, but often AI tools must artificially binarise predictions into definitely present or definitely absent[13]. Communicating model confidence allows a more interpretable and nuanced approach to the interpretation of a radiograph, where human judgement complements error-prone areas of the AI tool[14].
Our AI tool utilises a confidence bar, which has been refined through multiple rounds into a display of the model prediction relative to the threshold for positivity, and the prediction uncertainty (Figure 2). Figure 3 and 4 both demonstrate the model calling a simple pneumothorax, but Figure 4 is a borderline call, and is ground truth negative for pneumothorax.
Region of Interest (RoI) Map
“So what I like about this is at least it highlights areas to, to be mindful of, or at least your purple blobs there. Anyway,… I think it depicts better what might be an area of concern better than a bunch of words that say the same thing.” - UX interview with Interventional Radiologist, USA
“Yeah, I think it's good. Cause it's like you see the picture with the overlay on top of it, and then you can look at the same picture and see if there are real findings that are not” - UX interview with Interventional Radiologist, USA
Saliency maps are important for interpretability, providing visual confirmation that an AI is paying attention to the correct region of the image. The classic cautionary tale in medical AI is in melanoma classification, where a photographed ruler or surgical skin markings influenced predictions of malignant skin lesions by a diagnostic AI tool[15][16][17]. Previous work has focused on using saliency maps generated by techniques such as Grad-CAM or Integrated Gradients to query this[18][19][20][21].
Our AI model goes beyond this and provides explicit localisation. The model’s Y-net structure means the classification and the RoI map both reflect the model’s understanding of the image. This map helps to reassure the operator that the AI model is paying attention to the correct area of the image, and points to the area of the image triggering a finding if it is not obvious. If a RoI were not shown, the operator would instead be forced to scour the image to try and guess the feature that might have triggered the prediction, creating frustration and doubt. Figure 5 provides an example where providing a RoI map aids the clinician in making a decision for a flagged acute rib fracture that was ultimately found to be an old rib fracture.
Differentials
“That's probably another, another good one… differential diagnosis. So, you know, there's a finding based on probability, Chucks out say five different differentials. And quite honestly, as a human, I probably would only think of the top two or three. And then I'll look through that list and I'm like, Oh, okay. Maybe the fifth one is a reasonable choice.” - UX Interview with Radiologist, Australia
Providing multiple differentials means the operator can engage with the AI model more organically in a manner that is not simply black and white. This allows the operator to retain the power of decision making in the context of the patient’s clinical picture, as well as previous studies and patient history, which the model does not have access to.
Our AI model is deliberately allowed to predict multiple findings for a single radiographic feature, each with different confidences/probabilities. This behaviour is akin to the organic behaviour of humans in providing differentials for a radiographic finding. For example, an opacity may be labelled simultaneously as a “Focal Airspace Opacity”, a “Segmental Collapse”, and a “Pulmonary Mass”, with each finding having its own confidence. An example of this behaviour can be seen in Figure 6-7, where the clinican is offered two possibilities to consider for an opacity.