Transfer Learning with Convolutional Neural Networks for Classification of Abdominal Ultrasound Images

Phillip M Cheng, Harshawn S Malhi, Phillip M Cheng, Harshawn S Malhi

Abstract

The purpose of this study is to evaluate transfer learning with deep convolutional neural networks for the classification of abdominal ultrasound images. Grayscale images from 185 consecutive clinical abdominal ultrasound studies were categorized into 11 categories based on the text annotation specified by the technologist for the image. Cropped images were rescaled to 256 × 256 resolution and randomized, with 4094 images from 136 studies constituting the training set, and 1423 images from 49 studies constituting the test set. The fully connected layers of two convolutional neural networks based on CaffeNet and VGGNet, previously trained on the 2012 Large Scale Visual Recognition Challenge data set, were retrained on the training set. Weights in the convolutional layers of each network were frozen to serve as fixed feature extractors. Accuracy on the test set was evaluated for each network. A radiologist experienced in abdominal ultrasound also independently classified the images in the test set into the same 11 categories. The CaffeNet network classified 77.3% of the test set images accurately (1100/1423 images), with a top-2 accuracy of 90.4% (1287/1423 images). The larger VGGNet network classified 77.9% of the test set accurately (1109/1423 images), with a top-2 accuracy of VGGNet was 89.7% (1276/1423 images). The radiologist classified 71.7% of the test set images correctly (1020/1423 images). The differences in classification accuracies between both neural networks and the radiologist were statistically significant (p < 0.001). The results demonstrate that transfer learning with convolutional neural networks may be used to construct effective classifiers for abdominal ultrasound images.

Keywords: Artificial neural networks; Classification; Deep learning; Digital image processing; Machine learning.

Figures

Fig. 1
Fig. 1
Layer structures of the modified a CaffeNet and b VGGNet neural networks used in the study. Numbers in brackets indicate the number of nodes within a layer of the neural network. CONV = convolutional layer, FC = fully connected layer
Fig. 2
Fig. 2
Learning curves for a CaffeNet and b VGGNet. Loss curves indicate the training cross-entropy loss as a function of the training iteration. The test curves provide information on the loss function and classification accuracy of the test set during training, but were not used to optimize training hyperparameters
Fig. 3
Fig. 3
Confusion matrices for a CaffeNet, b VGGNet, and c an ultrasound radiologist. Numbers in each box indicate the number of images corresponding to each combination of predicted and true labels. Counts of correctly labeled images are along the diagonal
Fig. 4
Fig. 4
Venn diagram for images correctly classified by the two neural networks (CaffeNet = dashed circle, VGGNet = dotted circle) and the radiologist (solid circle). The areas of the diagram are approximately proportional to the number of images; the large common area in the center represents the 799 images classified correctly by both neural networks and the radiologist. A total of 123 images were incorrectly classified by both neural networks and the radiologist
Fig. 5
Fig. 5
Difference confusion matrices comparing incorrectly classified images between a the radiologist and CaffeNet and b the radiologist and VGGNet. Positive numbers indicate excess errors by the radiologist compared to the neural networks; negative numbers indicate excess errors by the neural networks compared to the radiologist. For clarity, counts of correctly classified images along the diagonal are omitted
Fig. 6
Fig. 6
Visualization by t-SNE of CaffeNet’s high dimensional vector representations of the 4094 training set images. Images with a similar high dimensional vector representation are displayed close to each other in this map
Fig. 7
Fig. 7
Examples of misclassified images. The correct technologist label appears above each image; the bar graph below each image depicts the top three category probabilities given by the CaffeNet network, with the dark bar corresponding to the correct image label. Images (a) and (b) were incorrectly classified by the radiologist but correctly classified by both neural networks. Images (c) and (d) were correctly classified by the radiologist but incorrectly classified by both neural networks

Source: PubMed

3
订阅