Background:
The first uses of DL methods in CADx systems from breast cancer diagnosis were made by Sahiner et al.
in 1996 [2],
where a convolutional neural network (CNN) was used to classify mass and normal breast tissue.
This approach had the first phase of image preprocessing,
where a region of interest (ROI) was selected and a series of operations (e.g.
filters) were applied in order to facilitate the learning process by the CNN.
This two-stage approach was used by other solutions,
varying the preprocessing methods and the input image modality,
(e.g.,
mammography,
MRI or tomosynthesis).
Mammography is the most used exam for classification/segmentation tasks [2,
3,
4,
5].
However,
MRI [6] and tomosynthesis [7] have been winning space,
despite not being so expressive for lack of samples for training DL models.
Recent approaches (including ours) use automatic features extracted with the use of DL models and also,
combining DL learnt features with handcrafted features [5,
8,
9],
designed by experts.
Base Approach:
We take às starting point our previous work (Arevalo et al.
in 2015 [9]) that separates the classification task into three main stages: (i) preprocessing of input images,
(ii) a supervised feature learning using a DL model and thereafter (iii) the supervised learning of a classifier (e.g.
Support Vector Machine (SVM) and/or Random Forest) to perform classification of benign and malign mass tumours.
The preprocessing phase tries to enhance image details to facilitate the feature extraction work,
increasing its performance.
This phase uses the following processes:
-
The ROI is cropped,
corresponding to a square around the lesion already outlined by an expert.
This process will reduce image size,
thus facilitating the training process;
-
A Data Augmentation process is applied to increase the number of samples by rotating the cropped images by 90,
180 and 270 degrees.
This process will give the model more data to learn,
increasing its training and possibly its performance.
It will also prevent future overfitting,
by creating a bigger diversity of data;
-
A global and local contrast normalization process is applied to all the samples.
These filters are useful to overcome effects like the difference of lightning between film images.
Once prepared,
the images are used to train a CNN that learns a simpler representation (a features vector).
With this,
a binary ML classifier (e.g.,
SVM) is trained using the produced features vectors,
which is later used to classify benign or malignant breast mass tumour.
The use of this representation learning approach is justified by the reduction in training time (it is cheaper to train an SVM using feature representation than using full image representation to train a CNN).
This approach reached an Area Under the ROC Curve (AUC) of 0.822,
outperforming the previous values already obtained by us when using computer vision methods like HOG or HGD descriptors to represent an image/exam [10].
Hereafter,
when it was added a set of handcrafted features,
the model got an AUC value of 0.826.
System Overview:
We aim to improve the base approach previously presented [9] and our goal is increasing the performance to classify benign and malignant breast cancer mass tumours.
For this purpose,
we use instances of the BCDR benchmarking dataset [11] (
https://bcdr.eu/).
After the preprocessing phase (see figure 1) to attain a better classification performance the feature extraction model was augmented,
following a known architecture that has proven to have a great performance in feature extraction.
The following combinations were tested:
-
Use of a Deep Learning model to perform feature extraction and also to perform the classification task,
with the use of a Softmax classification layer as the output of the DL model.
When predicting a sample,
the model will produce a binary classification (benign or malignant);
-
Use of a Deep Learning model to perform feature extraction and later feed learnt features to a classifier;
We use a customized variant of Google’s InceptionV3 architecture [12].
This architecture is an update to GoogLeNet,
winner of the 2014 ImageNet contest,
which was also trained on the ImageNet dataset.
This model achieved a top-1 accuracy of 78.0% and a top-5 accuracy of 93.9%.
The feature learning process gives a compressed representation of the input image,
allowing us to reduce training times and to experiment with different classifiers using the same features extracted using the DL model.