In this study, we built a network of CNN using Keras: library framework for NN. Table 1 shows the CNN architecture. The input tensor of the CNN is composed of (image_height, image_width, and image_channels). However, it does not include the batch dimensions. The output of each Conv2D layer and MaxPooling2D layer is a 3D tensor, with height, width, and channels as the shape. The width and height dimensions tend to shrink as the network gets deeper. The channel number is controlled by the first argument which is passed to the Conv2D layer. Next, the last output layer of the three-dimensional tensor is supplied to the fully connected layer. Because the classifier processes one-dimensional tensor, the processing is performed via Flatten, which transforms the three-dimensional output to the one-dimensional. The rectified linear unit [3] is employed as the activation function. Subsequently, a batch normalization [4] was performed. The final output is a binary-classification of safe and fail, generated using the binary cross entropy.
The NN data set a knee phantom and obtained 316 radiographic images, each with 1892 × 2192 pixels. Thereafter, a human observer classified the images into 95 safe images and 221 fail images. Furthermore, the region of interest is set at 1024 × 1024 pixels and the image is cropped so that the knee joint surface is at the center of the image. 20% of the data set was used for evaluation data set, whereas the remaining 80% was used as the training data set. However, the because of limitations with the memory capacity of computer, the pixel size was revised to 128 × 128 pixels using the nearest neighbor method [5].
In the evaluation data set, the correct answer rate was 60% when analyzed with the CNN. The low value is possibly because of underfitting due to lack of data sets [6]. Therefore, to increase the data set, the phantom radiographic images are rotated around the knee joint surface by 30, 60, and 90 degrees, and the image is acquired and added to the data set. Therefore, size of the data set was expanded to 515 safe images and 515 fail images.
The learning conditions were set to accommodate a batch size of 256 and the epoch was changed to 300, 1000, and 2000. The batch size refers to a randomly selected learning unit called batch from the training data set. Simultaneously, machine learning was performed by changing the L2 regularization parameters to 0.1, 0.01, and 0.001. In the CNN, leaning is performed for each learning unit. The epoch uses all the training data and becomes 1 epoch when learning is performed.