Artificial Intelligence, CT-High Resolution, Computer Applications-Detection, diagnosis, Tissue characterisation
C. H. Leow, V. Corona, P. Yousefi, M. Purtorab, K. Tait, S. Mohammadi, B. Irving, G. Brüggenwerth
Methods and materials
Methods and Materials
Retrospective data from IIP subjects representing a range of disease severity from mild to severe including CPFE were used from RISE-IIP study  and NHS data sets through strategic research agreements and acquired with multi-detector HRCT (different types/vendors and kernel settings, collimation ≈1,25 mm). The training dataset consisted of 850 annotated HRCT slices from 82 volumes. The independent test dataset contained 132 annotated slices from 130 volumes. These were annotated by an image analyst, specialising in ground truth annotation for medical imaging, with regular supervision by an expert radiologist for interstitial lung diseases. A subset of the test set was also used for interobserver variation testing. During the project, the test set was periodically reviewed by the radiologist, and revisions were made, ensuring the annotations were clinically accurate.
Ground truth generation
Pixelwise annotations were created by the analyst. The volumes were annotated excluding the hilar structures (pulmonary vessels and airways) up to approximately the lobar branches. Three classes were used to define pathology, “Fibrosis”, “Emphysema” and “Other” (normal lung and abnormal (not emphysema, not fibrosis)). Single slice annotations were also created for the test set.
Deep learning model and training
A convolutional neural network with 2D U-net architecture was trained to identify pulmonary fibrosis and emphysema and to quantify detected pathology. The number of layers followed the original U-net architecture  except for the last convolutional layer which maps the feature output to 4 classes (non-lung, fibrosis, emphysema, others). The model was implemented using TensorFlow and optimized by minimizing a modified focal loss using Adam optimize for 200 epochs with batch size of 16. 20% of the training set was used for validation during training and an initial learning rate of 0.001 was set to “reduce learning rate on plateau” and early stopping strategies were implemented.
Performance of the deep-learning (DL) approach was evaluated by comparing the prediction of the algorithm with the baseline thresholding approach. In brief, the baseline approach classifies fibrotic and emphysematous areas by detecting voxels with Hounsfield Units (HU) between -700 and 0 HU, and below -950 HU, respectively . We report Pearson correlation coefficients of our DL approach, the analyst’s annotations, and baseline models with respect to expert (or expert trained) annotations.