Artigo Acesso aberto Revisado por pares

Classification of breast mass in two‐view mammograms via deep learning

2020; Institution of Engineering and Technology; Volume: 15; Issue: 2 Linguagem: Inglês

10.1049/ipr2.12035

ISSN

1751-9667

Autores

Hua Li, Jing Niu, Dengao Li, Chen Zhang,

Tópico(s)

Infrared Thermography in Medicine

Resumo

IET Image ProcessingVolume 15, Issue 2 p. 454-467 ORIGINAL RESEARCH PAPEROpen Access Classification of breast mass in two-view mammograms via deep learning Hua Li, College of Information and Computer, Taiyuan University of Technology, Taiyuan, ChinaSearch for more papers by this authorJing Niu, College of Information and Computer, Taiyuan University of Technology, Taiyuan, ChinaSearch for more papers by this authorDengao Li, Corresponding Author lidengao@tyut.edu.cn College of Data Science, Taiyuan University of Technology, Taiyuan, China Shanxi Engineering Technology Research Center for Spatial Information Network, Taiyuan, China Correspondence Dengao Li, College of Data Science, Taiyuan University of Technology, Taiyuan, China. Email: lidengao@tyut.edu.cnSearch for more papers by this authorChen Zhang, College of Information and Computer, Taiyuan University of Technology, Taiyuan, ChinaSearch for more papers by this author Hua Li, College of Information and Computer, Taiyuan University of Technology, Taiyuan, ChinaSearch for more papers by this authorJing Niu, College of Information and Computer, Taiyuan University of Technology, Taiyuan, ChinaSearch for more papers by this authorDengao Li, Corresponding Author lidengao@tyut.edu.cn College of Data Science, Taiyuan University of Technology, Taiyuan, China Shanxi Engineering Technology Research Center for Spatial Information Network, Taiyuan, China Correspondence Dengao Li, College of Data Science, Taiyuan University of Technology, Taiyuan, China. Email: lidengao@tyut.edu.cnSearch for more papers by this authorChen Zhang, College of Information and Computer, Taiyuan University of Technology, Taiyuan, ChinaSearch for more papers by this author First published: 09 December 2020 https://doi.org/10.1049/ipr2.12035Citations: 1AboutSectionsPDF ToolsRequest permissionExport citationAdd to favoritesTrack citation ShareShare Give accessShare full text accessShare full-text accessPlease review our Terms and Conditions of Use and check box below to share full-text version of article.I have read and accept the Wiley Online Library Terms and Conditions of UseShareable LinkUse the link below to share a full-text version of this article with your friends and colleagues. Learn more.Copy URL Share a linkShare onEmailFacebookTwitterLinked InRedditWechat Abstract Breast cancer is the second deadliest cancer among women. Mammography is an important method for physicians to diagnose breast cancer. The main purpose of this study is to use deep learning to automatically classify breast masses in mammograms into benign and malignant. This study proposes a two-view mammograms classification model consisting of convolutional neural network (CNN) and recurrent neural network (RNN), which is used to classify benign and malignant breast masses. The model is composed of two branch networks, and two modified ResNet are used to extract breast-mass features of mammograms from craniocaudal (CC) view and mediolateral oblique (MLO) view, respectively. In order to effectively utilise the spatial relationship of the two-view mammograms, gate recurrent unit (GRU) structures of RNN is used to fuse the features of the breast mass from the two-view. The digital database for screening mammography (DDSM) be used for training and testing our model. The experimental results show that the classification accuracy, recall and area under curve (AUC) of our method reach 0.947, 0.941 and 0.968, respectively. Compared with previous studies, our method has significantly improved the performance of benign and malignant classification. 1 INTRODUCTION Cancer is a worldwide public problem. Among cancer cases in women, breast cancer has the highest incidence [1]. According to the statistics from the American Cancer Society, by 2020, there will be about 276,480 new cases of breast cancer in women, accounting for 30% of new cases of cancer in women [2]. If it can be detected early in the onset of breast cancer, the patient's five-year survival rate will increase by 70% compared to advanced cancer [3]. Therefore, the early detection and treatment of breast cancer is extremely important for patients. Mammography has become the most widely used and effective detection method for breast cancer because of its low cost and satisfying medical requirements [4]. Physicians get the diagnosis result mainly through the analysis of mammography, but the result is easily affected by the subjective experience and fatigue of physicians. In addition, because the features of breast masses are not obvious in the early stage, even for experienced physicians, it is still a challenging work to diagnose by mammograms. So it is very necessary to use computer aided diagnosis (CAD) system to help physicians make a diagnosis. Relevant research shows that a reliable CAD system can help physicians make correct judgments and effectively reduce the burden of patients [5]. The traditional method for breast mass classification is based on pattern recognition. First, features are extracted from mammograms manually, and then the extracted features are input into machine learning classifier for classification [6]. Although the traditional pattern recognition method has made some achievements in mammograms classification, this method relies on the artificially designed characteristics of researchers and lack the ability of autonomous learning. Convolutional neural network (CNN) is a method that can effectively overcome this shortcoming. It can automatically select and extract features from images and has achieved excellent performance in the field of natural image analysis. Therefore, CNN has attracted the attention of many researchers and they have tried to apply it to the analysis and diagnosis of medical images, such as lung CT image [7], brain MRI image [8], and thyroid ultrasound image [9]. In the field of medical image analysis, some researchers have started using CNN to diagnose breast masses [10]. We can divide these studies into two parts: Classification studies based on the whole mammograms and breast-mass patches. At the beginning of the study, the researchers tried to apply CNN directly to the classification of the whole mammogram. Zhang et al. [11] evaluated the classification performance of two classic CNN models, AlexNet and ResNet50, on the whole mammograms. They use two strategies of data augmentation and transfer learning to improve the classification performance of the model. Similarly, Wang et al. [12] compared the classification performance of AlexNet, VGG16 and ResNet50 in the whole mammogram and conducted experiments on three currently popular public databases. In order to accelerate the convergence rate of the CNN model and improve the classification performance of the CNN model, they used a pre-trained network as a feature extraction network. Li et al. [13] used the Inception structure to construct a new CNN model DenseNet II to classify benign and malignant mammogram. The advantage of the Inception structure is that it contains multiple scale convolution kernels, which can pay attention to the information of different scales of the image. Agnes et al. [14] also adopted the multi-scale convolution strategy to realise the classification of the whole mammogram. They used three different convolution kernels in each convolution layer to extract deep features, so that the network can pay attention to a wider range of image information. However, the size of the whole mammogram is usually 5000 × 3000 pixels. The input image of CNN is generally with small size. Large size mammograms directly resize into small size mammograms and will lose many useful features. Smaller breast masses may even become invisible, severely limiting the classification performance of the model. In addition, some researchers are exploring the research on segmentation of the whole mammograms. One way is to segment the pectoral muscle and breast region [15]. This method can segment the breast region from the image, reducing the interference of the image background and pectoral muscles on the network feature extraction performance. Another way is to automatically segment the lesion area from the whole image [16], which provides more accurate feature information for further classification of the lesion area. In order to improve the problem of using the whole mammograms to classify benign and malignant breast cancer, some researchers cropped the mass patches from the whole mammograms, and use the mass patches to classify breast cancer. The classification performance is improved by using different training strategies and integrating different network models. Arora et al. [17] proposed a two-stage classification system. In the first stage, five parallel CNN structures such as GoogleNet, ResNet18, and Inception are used to extract features from breast-mass patches, and the five extracted feature vectors are concatenated into one feature vector. In the second stage, they trained a neural network to classify mammograms. The main work of Sun et al. [18] is to compare the classification performance of three different networks for breast-mass patches. In addition, they also compared the classification results of the random and the pre-trained initialisation weights of the CNN model. The experimental results show that the pre-trained ResNet50 has achieved the best classification results in the DDSM database. Chougrad et al. [19] hope to improve the classification performance of breast images through fine-tuning of the network. The experiment found that Inception v3, which only fine-tuned two convolutional blocks, achieved the best results in the breast masses classification task. Some researchers have found that using a single image patch will ignore some useful information and limit the classification performance. So, they hope to overcome this difficulty by extracting multiple image patches. Compared with the whole mammogram, more mass features can be extracted from the mass patch, which improves the network performance. However, the characteristic information contained in a single image is often limited. Some researchers try to improve classification performance by extracting features from multiple image patches. Lotter et al. [20] cropped two different-scale patches of the lesion area from the whole mammograms. Two ResNets with the same structure are used to extract the features of the two mammogram patches, and the extracted features are fused to achieve the classification of mammograms. Li et al. [21] proposed a two-path neural network model, one path is used to extract the features of breast lesion patches, the other path is used to extract the features of segmentation mask maps, and finally the features extracted by the two paths are connected to achieve the classification of lesions. In clinical practice, mammography usually has two views called axial and lateral. They are called the craniocaudal (CC) projection and the mediolateral oblique (MLO) projection. Usually, the lesion area will appear in two different mammograms at the same time, but the features displayed are slightly different. Physicians usually need to combine two mammograms of the same breast to make a judgment about the lesion. However, most researchers only use a single-view mammogram to classify breast cancer, and it is often difficult to achieve a good response to the true classification results. Focusing on mammograms from two-view at the same time can extract more lesion features, which helps to improve the classification performance [22]. In recent years, the research of recurrent neural network (RNN) has provided more space for the improvement of CNN. RNN is also an important branch of deep learning research field [23], which has significant advantages in dealing with sequence data, and is widely used in video, signal, and text data. Recently, some researchers have tried to combine CNN and RNN for image classification. Moitra et al. [7] proposed a method combining CNN and RNN to automatically classify lung cancer images. Li et al. [24] used a model combining CNN and RNN to analyse brain MRI images. They used the model to analyse the features of the left and right hippocampal image patches to diagnose Alzheimer's disease. We have noticed that analysing images with spatial relationships through RNN can effectively improve the feature extraction ability and improve the classification performance of the model. In this study, we made improvements to address the current research problems. We propose a two-view neural network (TV-NN) model to improve the performance of breast mass classification. Our contributions are the following: We combined a deep separable convolution and a residual block to propose a based classification CNN (BC-CNN), which can effectively reduce network parameters and increase network speed. We used two-path BC-CNN to extract features of mammograms from CC view and MLO view, respectively; we combined CNN and RNN to analyse two-view mammograms. The BC-CNN model is used to extract two-view mammograms features, and the features are fused through the RNN's gate recurrent unit (GRU) model to achieve effective use of spatial features; we verified the proposed network pre-training method and proved the effectiveness of this pre-training method; we used the DDSM database to verify the TV-NN model and obtained good results. Finally, the area under the ROC curve (AUC) was 0.968, the accuracy was 0.947, and the recall was 0.941. 2 DATABASE The mammography images used in this article are collected from the digital database for screening mammography (DDSM) [25] database. The database can provide material for the research of computer aided diagnostic system, and at present DDSM is widely used in the research field. DDSM database is divided into four types of data: Cancer, normal, benign and benign without callback. There are 2620 cases available in 43 volumes, including 12 normal, 15 malignant, 14 benign and two undiagnosed volumes. Each case of DDSM contains mammograms from two different views of the left and right mammograms (left CC (LCC), left MLO (LMLO), right CC (RCC), and right MLO (RMLO)). The location, shape, margin, benign and malignant of the lesion is also provided. Figure 1 shows the images of benign and malignant breast mass from two-view on the left and right sides. FIGURE 1Open in figure viewerPowerPoint Benign and malignant mammograms of breast cancer. (a) and (b) are benign, (c) and (d) are malignant, two views are craniocaudal (CC) and mediolateral oblique (MLO), L and R represent left and right, respectively. The red circle shows the location of the breast mass In the DDSM database, the physicians' diagnosis information is contained in an OVERLAY file. The diagnosis information includes the lesion type, breast density, and the chain code of the lesion area segmentation. We converted the original LJPEG format to PNG format, and extracted the coordinates of the outermost boundary point of the lesion area from the annotation file. 2.1 Image normalisation Normalisation [26] not only can accelerate the convergence speed and increase the accuracy of the model, but also alleviate the scattered feature distribution in the deep network to some extent. It makes the training of deep network easier and more stable, so that the training can use a large learning rate. At present, standardisation has become the standard of neural network training. In general, we will normalise the characteristics of the input sample to make the data normally distributed (mean 0, standard deviation 1) x ̂ = x − E x Var x (1)where E ( x ) is the mean of samples and Var ( x ) is the standard deviation of samples. 2.2 Image patch extraction In the DDSM database, each case contains a detailed label by the physicians, which contains the coordinates of the boundary area of the mass. According to the physicians' annotation, we constructed a rectangle by attaching the outermost point of the mass boundary, and we used the centre of the rectangle as our extraction centre. In order to ensure that the shape of the mass plaque image does not change when it is input into the network, we set a square crop area. Figure 2 shows the patch extraction strategy of the mass. To enable the CNN to pay attention to the information around the mass, we set the side length of the extracted area to twice the length of the longer side of the rectangle. Because the pixel values of the regions of interest that are manually extracted are different, in order to facilitate the training of the network, we resize the pixel values of the image patches to 512 × 512. FIGURE 2Open in figure viewerPowerPoint Image patch extraction. The longest side of the smallest rectangle containing the breast mass is a, centre is point o, we cut the area containing the breast mass with a square centred on the point o and side length 2a 2.3 Data division The DDSM database contains a total of 891 cases of masses, including 970 masses (522 benign and 448 malignant masses). We divide all the masses into 10 subsets according to the proportion, each of which contains 97 masses. Among them, the first eight subsets contain 52 benign and 45 malignant masses. The ninth and 10th subsets contained 53 benign and 44 malignant masses, respectively. 2.4 Data augmentation Deep learning requires a lot of data to ensure accuracy and prevent overfitting. Data augmentation was been given to increase the number of data in the case of few mammograms. Because of the positional correlation between CC and MLO, we only consider the use of flip and translation strategies for data enhancement. We enhance the data of the 10 subsets according to the following strategies, and the enhanced data still belongs to the original subset. 2.4.1 Flip We flip the original images according to the following image flipping strategies: (a) The CC image is flipped up and down, and the MLO image remains unchanged; (b) the CC image is flipped left and right, while the MLO image is also flipped left and right. The flipped images are then extracted according to the image patch extraction strategy in Section 2.2. Here, we consider the flipped mass image patch as a new mass. By flipping, we expand the data by three times. 2.4.2 Translation In order to achieve the robustness of position, we translate the image extraction area up, down, left and right by 10% to obtain more patches. At the same time, we randomly combine the five patches extracted from the CC image and the five patches extracted from the MLO image to form five pairs of two-view mammograms. Therefore, by translation, we expand the data by five times. It should be noted that the nature of mass has not changed after data augmentation. Through the above two strategies, the data can be expanded to 15 times of the original. A total of 14,550 pairs of two-view mammogram patches were obtained, of which 7830 were benign and 6720 were malignant. 3 METHOD We proposed a TV-NN classification model. First, feature extraction of two-view mammograms (CC and MLO) on the same side was carried out on BC-CNN, and then the features extracted were fused by RNN. Finally, images were classified into benign or malignant ones. Figure 3 is the specific process structure of TV-NN. FIGURE 3Open in figure viewerPowerPoint The overall architecture of the two-view neural network (TV-NN). The first step is data preprocessing, the breast masses were extracted from the whole mammograms from two views. The second step is feature extraction, based classification convolutional neural network (BC-CNN) extracts the characteristics of breast masses (BC-CNN is our proposed classification-based convolutional neural network). The third step is feature fusion, input of the extracted breast masses features into gate recurrent unit (GRU) structure for feature fusion The proposed method is divided into the following steps: Two breast mammograms from two-view were input into BC-CNN, respectively, and the features were extracted. In order to fuse the features of the two-view mammograms, the features of BC-CNN extracted from the two-view mammograms were, respectively, input into the recursive neural network. The fused features of RNN were Input into the softmax layer for classification. 3.1 Basic structure of CNN In this study, residual block was combined with deep separable convolution to apply a new network structure called the inverted residual block. A new network structure based on BC-CNN is proposed. 3.1.1 Residual block As the plain network gets deeper and deeper, the classification results become worse, and the gradient disappears [27], leading to slower network convergence and worse classification accuracy. ResNet proposed a residual learning [28] method to improve this situation. ResNet solved the problem of disappearance of gradient return by introducing cross-layer linkage, making to train very deep CNN become simple. The comparison between traditional network connections and cross-layer residual connections is shown in Figure 4. FIGURE 4Open in figure viewerPowerPoint 'Plain' layers (left) and residual block (right). Compared to ordinary network connections, the residual block adds cross-layer linkage The rectified linear (ReLU) activation function is used after the convolution operation to increase the non-linearity of the model. If the input is less than zero, the ReLU outputs zero. Else, the output equals the input. The formula of ReLU function is as follows: R x = max 0 , x (2) 3.1.2 Depthwise separable convolution Depthwise separable convolution [29] is divided into depthwise and pointwise convolutions. Each convolution kernel of the standard convolution performs the convolution with the data of all channels. Convolution kernel of the depthwise separable convolution performs the convolution only with one channel. Pointwise convolution is correlated with feature maps obtained by depthwise convolution. The detailed structure of depthwise separable is shown in Figure 5. FIGURE 5Open in figure viewerPowerPoint Ordinary convolution and depthwise separable convolution. Depth separable convolution solves the traditional convolution integral into a depthwise and a pointwise convolutions ( 1 × 1 convolutional) 3.1.3 Inverted residual block network structure In this study, depthwise separable convolution is applied to residual structure instead of standard convolution operation in residual structure. The use of 1 × 1 convolutions first reduces and then increases the dimension of the feature map, and extracts the features by a 3 × 3 convolution. In order to avoid large information loss caused by ReLU to tensors with few channels, the linear layer is used to replace the ReLU non-linear operation. Figure 6 shows the network architecture of the residual block. Tensors with a small number of channels will lose information using the ReLU function. So, the linear layer is used to replace the non-linear operation of ReLU. FIGURE 6Open in figure viewerPowerPoint Inverted residual block. On the left is a block with a stride of 1 connected by a shortcut, on the right is the block with a stride of 2 to perform down sampling operation 3.1.4 Classification-based CNN In this section, a BC-CNN was constructed using the inverted residual block mentioned above. Table 1 shows the concrete structure of the BC-CNN. TABLE 1. The description for BC-CNN architecture Type/stride Filter shape Input size Output size Inverted residual block Conv1/2 3 × 3 × 32 512 × 512 × 3 256 × 256 × 32 – Conv2/1 3 × 3 × 16 256 × 256 × 32 256 × 256 × 16 1 Conv3/2 3 × 3 × 16 256 × 256 × 16 128 × 128 × 16 1 Conv4/1 3 × 3 × 24 128 × 128 × 16 128 × 128 × 24 2 Conv5/2 3 × 3 × 24 128 × 128 × 24 64 × 64 × 24 1 Conv6/1 3 × 3 × 32 64 × 64 × 24 64 × 64 × 32 2 Conv7/2 3 × 3 × 32 64 × 64 × 32 32 × 32 × 32 1 Conv8/1 3 × 3 × 64 32 × 32 × 32 32 × 32 × 64 2 Conv9/1 3 × 3 × 96 32 × 32 × 64 32 × 32 × 96 3 Conv10/2 3 × 3 × 96 32 × 32 × 96 16 × 16 × 96 1 Conv11/1 3 × 3 × 160 16 × 16 × 96 16 × 16 × 160 1 Conv13/1 3 × 3 × 320 16 × 16 × 160 16 × 16 × 320 1 Conv14/2 3 × 3 × 320 16 × 16 × 320 8 × 8 × 320 1 Conv15/1 1 × 1 × 1280 8 × 8 × 320 8 × 8 × 1024 – FC 8 × 8 × 1024 1 × 1 × 1024 – softmax Notes: The network uses convolution with step size of 2 to replace the maximum pooling layer for subsampling. We input MLO and CC mammograms into two BC-CNNs and removed the last full connected layer. We noticed that the size of the extracted feature map was 8 × 8 × 1024. At this time, we used the global average operation to simplify the image feature into a one-dimensional vector, which was input into the recurrent neural network. 3.2 Feature fusion based on GRU There is a correlation between the two views of mammograms, and each view combines information from the previous view. The RNN has a unique structure which makes it very suitable for handling information related to time or space [30]. The features of two-view mammograms are spatially related. RNN consists of three parts: Input, hidden and output units. The emergence of Long Short-Term Memory (LSTM) [31] solves the problem of gradient disappearance and gradient explosion of RNN. The GRU [32] merges the forget gate and the inputs into an update gate, while the network loops back and forth only the output as a memory state. Inputs and outputs of GRU are simpler than LSTM. Features from two-view mammograms are fused by two GRU modules. All GRU modules share the same parameters. The GRU function is shown in Figure 7. The classification results are obtained by using the softmax activation function. The formula is as follows: softmax x i = exp x i ∑ j = 1 n exp x j (3)where i = 1 , … , n . r t = σ W r · h t − 1 , x t (4) z t = σ W z · h t − 1 , x t (5) h t ′ = tanh W ′ t · r t ∗ h t − 1 , x t (6) h t = 1 − z t ∗ h t − 1 + z t ∗ h t ′ (7)Where [] represents the connection of two vectors and ∗ represents the product of matrices. FIGURE 7Open in figure viewerPowerPoint Working principle of GRU 3.3 Model-training strategy Training of the proposed TV-NN network model includes the pre-training of BC-CNN and the fine-tuning of GRU network for specific classification task. In our implementation, the BC-CNN model which extracts the CC image patches features is pre-trained with enhanced MLO image patches. Similarly, the BC-CNN model which extracts the MLO image patches features is pre-trained with enhanced CC image patches. Then, the pre-trained BC-CNN models are fine-tuned with the enhanced mammogram patches from CC and MLO, respectively. The softmax function is used to connect the full connection layer to the category label of the output. We fixed the parameters of all convolutional layers, pooling layers and full-connected layer in BC-CNN. Finally, the GRU parameters are fine-tuned by using the two-view mammograms. 4 EXPERIMENTAL RESULTS AND DISCUSSION In this study, we experiment with the pre-processed DDSM database. The method is verified by contrast experiments, and the evaluation method is given to further analyse the experimental results. All experiments were carried out using Python3.6 in the PyTorch framework. The training and testing were done on a PC equipped with a 16 GB core i7 CPU and two NVIDIA Titan X GPUs, using the Ubantu18.04 system. 4.1 Experimental setup We use the method of k-fold cross-validation [33] to carry out our experiments. Each experiment selects k-1 subsets as the training set and the remaining one as the testing set. The experiment was carried out k times, and the corresponding accuracy rate was obtained for each experiment. The average of the accuracy of k experiments was calculated as the final accuracy. In the experiment, Adam optimiser was used to update parameters, and the cross-entropy function was used to calculate the error. The learning rate was changed to one tenth of the original one for every 100 epochs iterated, and the initial learning rate was 0.001. The resize of the original picture after input into the network was 512 × 512 , a total of 300 epochs. 4.2 Evaluation of metrics For quantitatively analysing the experimental results, we used several commonly used evaluation indicators, accuracy, recall and receiver operating characteristic (ROC) curve. The evaluation indexes are represented by true positive (TP), true negative (TN), false positive (FP) and false negative (FN). The following three definitions of evaluation indicators are given: accuracy = TP + TN TP + FP + TN + FN (8) recall = TP TP + FN (9) The horizontal coordinate of ROC curve is false positive case rate (FPR) and the vertical coordinate is true case rate (TPR) where FPR = TP TP + FN , TPR = FP TN + FP . 4.3 Comparative experiment and result analysis In this section, we set up different comparative experiments to verify the feasibility of our proposed method. Ten-fold cross-validation is used to evaluate the proposed model, and we have divided all data into 10 subsets previously. We compared and analysed different network models, pre-training methods and different fusion strategies. 4.3.1 Comparative analysis of different network models We used single-view and mix-view to analyse and evaluate the performance of different networks. We tested three different network structures, and Table 2 gives the relevant experimental results. We fine-tuned the original VGGNet and ResNet, changed the

Referência(s)