Semantic consistency generative adversarial network for cross-modality domain adaptation in ultrasound thyroid nodule classification

Jun Zhao, Xiaosong Zhou, Guohua Shi, Ning Xiao, Kai Song, Juanjuan Zhao, Rui Hao, Keqin Li, Jun Zhao, Xiaosong Zhou, Guohua Shi, Ning Xiao, Kai Song, Juanjuan Zhao, Rui Hao, Keqin Li

Abstract

Deep convolutional networks have been widely used for various medical image processing tasks. However, the performance of existing learning-based networks is still limited due to the lack of large training datasets. When a general deep model is directly deployed to a new dataset with heterogeneous features, the effect of domain shifts is usually ignored, and performance degradation problems occur. In this work, by designing the semantic consistency generative adversarial network (SCGAN), we propose a new multimodal domain adaptation method for medical image diagnosis. SCGAN performs cross-domain collaborative alignment of ultrasound images and domain knowledge. Specifically, we utilize a self-attention mechanism for adversarial learning between dual domains to overcome visual differences across modal data and preserve the domain invariance of the extracted semantic features. In particular, we embed nested metric learning in the semantic information space, thus enhancing the semantic consistency of cross-modal features. Furthermore, the adversarial learning of our network is guided by a discrepancy loss for encouraging the learning of semantic-level content and a regularization term for enhancing network generalization. We evaluate our method on a thyroid ultrasound image dataset for benign and malignant diagnosis of nodules. The experimental results of a comprehensive study show that the accuracy of the SCGAN method for the classification of thyroid nodules reaches 94.30%, and the AUC reaches 97.02%. These results are significantly better than the state-of-the-art methods.

Keywords: Cross-modality domain adaptation; Domain knowledge; Self-attention mechanism; Semantic consistency; Thyroid nodule classification.

Conflict of interest statement

Conflict of InterestsThe authors declare that they have no conflict of interest.

© The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature 2021.

Figures

Fig. 1
Fig. 1
From left to right, the original ultrasound images of the four benign/malignant nodules randomly sampled in the dataset, the corresponding “Ultrasound Findings” in the ultrasonography report, and the domain knowledge are shown. Among them, the red text description is based on the relevant disease keywords selected by TI-RADS as the standard
Fig. 2
Fig. 2
Overview of our proposed SCGAN, consisting of a text encoder (top left), a domain adaptation generator Gst(bottom left), a discriminator Dt(top right), and a classifier (bottom right). Gst has two inputs, Z and Tsenc generated by the text encoder, both of which are implemented in upblocks (gray boxes) for cross-domain fusion. The SACAM contained in upblocks promotes semantic alignment during the fusion process. Similarly, Dt distinguishes the authenticity of an image by a series of downblocks (gray boxes, Ist represents the synthesized image, It represents the real image). The classifier is the modified classification model ResNet-50. In particular, the adversarial loss refers to the hinge version of the adversarial loss, ℒvd is the visual discrepancy loss, and ℒCD−GP is the cross-domain fusion zero-centered gradient penalty function
Fig. 3
Fig. 3
Details of the cross-modal alignment self-attention module (CASAM). The semantic alignment layer can focus on the source domain features corresponding to the target domain pixels. ⊗ denotes dot product operation
Fig. 4
Fig. 4
The overall process of image preprocessing. Where the blue dashed line indicates the vertical diameter of the nodule and the green dashed line indicates the horizontal diameter
Fig. 5
Fig. 5
ROC analysis of different image generation models with our SCGAN and its variants for thyroid nodule classification
Fig. 6
Fig. 6
The loss curve of generator and discriminator in SCGAN
Fig. 7
Fig. 7
Inception score (IS) analysis for different parameters of the loss function
Fig. 8
Fig. 8
Box and whisker plot analysis of the inception score (IS) for different image generation methods
Fig. 9
Fig. 9
Visualization results of 24 images generated using DAGAN, CASAM (i.e., variant SCGAN−ℒvd−ℒCD−GP), and SCGAN

References

    1. Acharya UR, Chowriappa P, Fujita H, Bhat S, Dua S, Koh JE, Eugene L, Kongmebhol P, Ng KH. Thyroid lesion classification in 242 patient population using gabor transform features from high resolution ultrasound images. Knowl-Based Syst. 2016;107:235–245. doi: 10.1016/j.knosys.2016.06.010.
    1. Avola D, Cinque L, Fagioli A, Filetti S, Rodola E. Multimodal feature fusion and knowledge-driven learning via experts consult for thyroid nodule classification. IEEE Trans Circ Syst Video Technol. 2021;PP(99):1–1.
    1. Brock A, Donahue J, Simonyan K (2018) Large scale gan training for high fidelity natural image synthesis. In: International conference on learning representations
    1. Bullinaria JA, Levy JP. Extracting semantic representations from word co-occurrence statistics: a computational study. Behav Res Methods. 2007;39(3):510–526. doi: 10.3758/BF03193020.
    1. Chartrand G, Cheng PM, Vorontsov E, Drozdzal M, Turcotte S, Pal CJ, Kadoury S, Tang A. Deep learning: a primer for radiologists. Radiographics. 2017;37(7):2113–2131. doi: 10.1148/rg.2017170077.
    1. Chen C, Dou Q, Chen H, Heng PA (2018) Semantic-aware generative adversarial nets for unsupervised domain adaptation in chest x-ray segmentation. In: International workshop on machine learning in medical imaging. Springer, pp 143–151
    1. Chen C, Dou Q, Chen H, Qin J, Heng PA. Unsupervised bidirectional cross-modality adaptation via deeply synergistic image and feature alignment for medical image segmentation. IEEE Trans Med Imaging. 2020;39(7):2494–2505. doi: 10.1109/TMI.2020.2972701.
    1. Chen J, You H, Li K. A review of thyroid gland segmentation and thyroid nodule segmentation methods for medical ultrasound images. Comput Methods Programs Biomed. 2020;185:105329. doi: 10.1016/j.cmpb.2020.105329.
    1. Dong H, Yu S, Wu C, Guo Y (2017) Semantic image synthesis via adversarial learning. In: Proceedings of the IEEE international conference on computer vision, pp 5706–5714
    1. Greff K, Srivastava RK, Koutník J, Steunebrink BR, Schmidhuber J. Lstm: A search space odyssey. IEEE Trans Neural Netw Learn Syst. 2016;28(10):2222–2232. doi: 10.1109/TNNLS.2016.2582924.
    1. Gu Y, Ge Z, Bonnington CP, Zhou J. Progressive transfer learning and adversarial domain adaptation for cross-domain skin disease classification. IEEE J Biomed Health Inform. 2019;24(5):1379–1393. doi: 10.1109/JBHI.2019.2942429.
    1. He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 770–778
    1. Kwon SW, Choi IJ, Kang JY, Jang WI, Lee GH, Lee MC. Ultrasonographic thyroid nodule classification using a deep convolutional neural network with surgical pathology. J Digit Imaging. 2020;33(5):1202–1208. doi: 10.1007/s10278-020-00362-w.
    1. Li H, Weng J, Shi Y, Gu W, Mao Y, Wang Y, Liu W, Zhang J. An improved deep learning approach for detection of thyroid papillary cancer in ultrasound images. Scientif Rep. 2018;8(1):1–12.
    1. Li Z, Yang K, Zhang L, Wei C, Yang P, Xu W (2020) Classification of thyroid nodules with stacked denoising sparse autoencoder. Int J Endocrinol 2020
    1. Ma J, Wu F, Zhu J, Xu D, Kong D. A pre-trained convolutional neural network based method for thyroid nodule diagnosis. Ultrasonics. 2017;73:221–230. doi: 10.1016/j.ultras.2016.09.011.
    1. Mescheder L, Geiger A, Nowozin S (2018) Which training methods for gans do actually converge?. In: International conference on machine learning. PMLR, pp 3481–3490
    1. Messina N, Falchi F, Esuli A, Amato G (2021) Transformer reasoning network for image-text matching and retrieval. In: 2020 25Th international conference on pattern recognition (ICPR). IEEE, pp 5222–5229
    1. Moujahid D, Elharrouss O, Tairi H. Visual object tracking via the local soft cosine similarity. Pattern Recogn Lett. 2018;110:79–85. doi: 10.1016/j.patrec.2018.03.026.
    1. Odena A, Olah C, Shlens J (2017) Conditional image synthesis with auxiliary classifier gans. In: ICML’17 Proceedings of the 34th International Conference on Machine Learning - vol 70, pp 2642–2651
    1. Oord AVD, Li Y, Vinyals O (2018) Representation learning with contrastive predictive coding. arXiv:
    1. Pan C, Huang J, Hao J, Gong J. Towards zero-shot learning generalization via a cosine distance loss. Neurocomputing. 2020;381:167–176. doi: 10.1016/j.neucom.2019.11.011.
    1. Prochazka A, Gulati S, Holinka S, Smutek D. Patch-based classification of thyroid nodules in ultrasound images using direction independent features extracted by two-threshold binary decomposition. Comput Med Imaging Graph. 2019;71:9–18. doi: 10.1016/j.compmedimag.2018.10.001.
    1. Qin P, Wu K, Hu Y, Zeng J, Chai X. Diagnosis of benign and malignant thyroid nodules using combined conventional ultrasound and ultrasound elasticity imaging. IEEE J Biomed Health Inform. 2020;24(4):1028–1036. doi: 10.1109/JBHI.2019.2950994.
    1. Radford A, Metz L, Chintala S (2016) Unsupervised representation learning with deep convolutional generative adversarial networks. In: ICLR 2016 : International Conference on learning representations 2016
    1. Raghavendra U, Gudigar A, Maithri M, Gertych A, Meiburger KM, Yeong CH, Madla C, Kongmebhol P, Molinari F, Ng KH, et al. Optimized multi-level elongated quinary patterns for the assessment of thyroid nodules in ultrasound images. Comput Biol Med. 2018;95:55–62. doi: 10.1016/j.compbiomed.2018.02.002.
    1. Ren J, Hacihaliloglu I, Singer EA, Foran DJ, Qi X (2018) Adversarial domain adaptation for classification of prostate histopathology whole-slide images. In: International conference on medical image computing and computer-assisted intervention. Springer, pp 201–209
    1. Shi G, Wang J, Qiang Y, Yang X, Zhao J, Hao R, Yang W, Du Q, Kazihise NGF. Knowledge-guided synthetic medical image adversarial augmentation for ultrasonography thyroid nodule classification. Comput Methods Prog Biomed. 2020;196:105611. doi: 10.1016/j.cmpb.2020.105611.
    1. Simonyan K, Zisserman A (2015) Very deep convolutional networks for large-scale image recognition. In: ICLR 2015 : International Conference on learning representations 2015
    1. Szegedy C, Liu W, Jia Y, Sermanet P, Reed S, Anguelov D, Erhan D, Vanhoucke V, Rabinovich A (2015) Going deeper with convolutions. In: 2015 IEEE Conference on computer vision and pattern recognition (CVPR), pp 1–9
    1. Szegedy C, Vanhoucke V, Ioffe S, Shlens J, Wojna Z (2016) Rethinking the inception architecture for computer vision. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2818–2826
    1. Tessler FN, Middleton WD, Grant EG, Hoang JK, Berland LL, Teefey SA, Cronan JJ, Beland MD, Desser TS, Frates MC, et al. Acr thyroid imaging, reporting and data system (ti-rads): white paper of the acr ti-rads committee. J Amer college Radiol. 2017;14(5):587–595. doi: 10.1016/j.jacr.2017.01.046.
    1. Ververas E, Zafeiriou S. Slidergan: Synthesizing expressive face images by sliding 3d blendshape parameters. Int J Comput Vis. 2020;128(10):2629–2650. doi: 10.1007/s11263-020-01338-7.
    1. Wang H, Jia H, Lu L, Xia Y. Thorax-net: an attention regularized deep neural network for classification of thoracic diseases on chest radiography. IEEE J Biomed Health Inform. 2019;24(2):475–485. doi: 10.1109/JBHI.2019.2928369.
    1. Wang H, Wang S, Qin Z, Zhang Y, Li R, Xia Y. Triple attention learning for classification of 14 thoracic diseases using chest radiography. Med Image Anal. 2021;67:101846. doi: 10.1016/j.media.2020.101846.
    1. Wang J, Li S, Song W, Qin H, Zhang B, Hao A (2018) Learning from weakly-labeled clinical data for automatic thyroid nodule classification in ultrasound images. In: 2018 25Th IEEE international conference on image processing (ICIP), pp 3114–3118
    1. Wang L, Zhang L, Zhu M, Qi X, Yi Z. Automatic diagnosis for thyroid nodules in ultrasound images by deep neural networks. Med Image Anal. 2020;61:101665. doi: 10.1016/j.media.2020.101665.
    1. Wang M, Deng W. Deep visual domain adaptation: a survey. Neurocomputing. 2018;312:135–153. doi: 10.1016/j.neucom.2018.05.083.
    1. Wang X, Gupta A (2015) Unsupervised learning of visual representations using videos. In: Proceedings of the IEEE international conference on computer vision, pp 2794–2802
    1. Wang X, Li L, Ye W, Long M, Wang J (2019) Transferable attention for domain adaptation. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol 33, pp 5345–5352
    1. Xie X, Niu J, Liu X, Chen Z, Tang S (2020) A survey on domain knowledge powered deep learning for medical image analysis. arXiv:
    1. Xu T, Zhang P, Huang Q, Zhang H, Gan Z, Huang X, He X (2018) Attngan: Fine-grained text to image generation with attentional generative adversarial networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1316–1324
    1. Yang W, Zhao J, Qiang Y, Yang X, Dong Y, Du Q, Shi G, Zia MB (2019) Dscgans: Integrate domain knowledge in training dual-path semi-supervised conditional generative adversarial networks and s3vm for ultrasonography thyroid nodules classification. In: International conference on medical image computing and computer-assisted intervention, pp 558–566
    1. Yao Y, Zhang Y, Li X, Ye Y (2019) Heterogeneous domain adaptation via soft transfer network. In: Proceedings of the 27th ACM international conference on multimedia, pp 1578–1586
    1. Yi X, Walia E, Babyn P. Generative adversarial network in medical imaging: a review. Medical image analysis. 2019;58:101552. doi: 10.1016/j.media.2019.101552.
    1. Yin G, Liu B, Sheng L, Yu N, Wang X, Shao J (2019) Semantics disentangling for text-to-image generation. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 2327–2336
    1. Zhang H, Goodfellow I, Metaxas D, Odena A (2019) Self-attention generative adversarial networks. In: International conference on machine learning. PMLR, pp 7354–7363
    1. Zhang H, Xu T, Li H (2017) Stackgan: Text to photo-realistic image synthesis with stacked generative adversarial networks. In: 2017 IEEE International conference on computer vision (ICCV), pp 5908–5916
    1. Zhang Y, Wei Y, Wu Q, Zhao P, Niu S, Huang J, Tan M. Collaborative unsupervised domain adaptation for medical image diagnosis. IEEE Trans Image Process. 2020;29:7834–7844. doi: 10.1109/TIP.2020.3006377.
    1. Zhou H, Wang K, Tian J. Online transfer learning for differential diagnosis of benign and malignant thyroid nodules with ultrasound images. IEEE Trans Biomed Eng. 2020;67(10):2773–2780. doi: 10.1109/TBME.2020.2971065.

Source: PubMed

3
Subscribe