Skin Lesion Diagnosis Using Ensemble Deep Learning Models


Mehdi Yousefzadeh 1 , 2 , Parsa Esfahanian 1 , * , Saeid Rahmani 1 , Hossein Motahari 2 , Dara Rahmati 1 , 3 , Saeid Gorgin 1 , 4

1 School of Computer Science, Institute for Research in Fundamental Sciences, Tehran, Iran

2 Faculty of Physics, Shahid Beheshti University, Tehran, Iran

3 Faculty of Computer Science and Engineering, Shahid Beheshti University, Tehran, Iran

4 Iranian Research Organization for Science and Technology, Tehran, Iran

How to Cite: Yousefzadeh M, Esfahanian P, Rahmani S, Motahari H, Rahmati D, et al. Skin Lesion Diagnosis Using Ensemble Deep Learning Models, Iran J Radiol. 2019 ; 16(Special Issue):e99142. doi: 10.5812/iranjradiol.99142.


Iranian Journal of Radiology: 16 (Special Issue); e99142
Published Online: December 10, 2019
Article Type: Abstract
Received: October 26, 2019
Accepted: December 10, 2019


Background: Skin cancer is a serious public health concern (1). With over 5 million newly diagnosed cases every year, it is the most common form of cancer worldwide (3). Among the different categories of skin cancer, melanoma is the deadliest and most dangerous form. It is estimated to be responsible for 7230 cases of death in 2019 globally (4). Although the melanoma mortality rate is significant, its survival rate exceeds 95% if detected early (5,6).

Objectives: We propose a deep learning framework that could perform skin lesion diagnosis with precision and accuracy. Such a framework could work by classifying dermoscopic images based on skin lesion categories. These categories include melanoma, melanocytic nevus, basal cell carcinoma, actinic keratosis, benign keratosis, dermatofibroma, and vascular lesion.

Methods: We used the convolutional neural network models to classify dermoscopic images. The selected models had high Top-1 accuracy on the ImageNet dataset. These models included InceptionResNetV2, Xception, and EfficientNetB3. All of these models were initialized with their pre-trained weights on the ImageNet dataset. They all used categorical cross-entropy for loss function and Adam optimizer with standard parameters. Two different ensemble methods were employed in our study. The first one, called Softmax-only, used Xception and EfficientNetB3 with Softmax activation for both models’ prediction layer. In this method, during the training process, each model created a checkpoint of itself and recorded its balance accuracy at each epoch. After training completion, checkpoints corresponding to some of the highest balance accuracies for each model would be selected. The average balance accuracy of the selected model checkpoints was reported as the method’s performance. The second method was called Sigmoid-only that used InceptionResNetV2, Xception, and EfficientNetB3. This method performed pretty much the same as the first, except that all the models had Sigmoid activation for their prediction layers and the evaluation criterion was the F1-score. The used dataset was the 2018 ISIC archive, which included 25331 dermoscopic images for the training set and 1516 images for the test (evaluation) set (1). For the Softmax-only method, the training set was split 80% - 20% for cross-validation and for the Sigmoid-only method, it was split 85% - 15% for fixed-validation. The training set was also augmented using random crop, random rotation, and random flipping.

Results: The Softmax-only method had a balanced accuracy of 0.901 (± 0.12) and the Sigmoid-only method had an average F1-score of 0.932 on the seven classes. Finally, our framework managed to achieve a balanced accuracy of 0.866 on the test set. Furthermore, our framework participated in Task 3 of the “ISIC 2018: Skin Lesion Analysis Towards Melanoma Detection” challenge, which brought us the first rank on the challenge’s live leaderboards (up until 18 September 2019; link:

Conclusion: We proposed a deep learning framework to classify the ISIC archive 2018 dataset of dermoscopic images based on skin lesion categories. Our methods had the validation results of 0.901 (± 0.12) for balance accuracy and 0.932 for average F1-score, as well as a balanced accuracy of 0.866 on the test set. This framework also managed to achieve the first rank of the “ISIC 2018 Task 3” challenge.

To see tables and references, please refer to the PDF file.

Copyright © 2019, Author(s). This is an open-access article distributed under the terms of the Creative Commons Attribution-NonCommercial 4.0 International License ( which permits copy and redistribute the material just in noncommercial usages, provided the original work is properly cited.