A new nested ensemble technique for automated diagnosis of breast cancer☆
Introduction
Globally, breast cancer comprises approximately 15% percent of all cancers affecting females [1]. Approximately 1 in 37 breast cancer patients will die as a result of the disease and it has been cited as the second most common cause of cancer-related death amongst females [2]. Breast cancer can occur in females of any age, but most commonly tends to affect females between the ages of 15 and 54 years old [3]. Preventative screening is key to the early detection and treatment of breast cancers, and many countries around the world have successfully initiated screening programs that have resulted in an almost one-third reduction in the burden of disease [4].
There are several techniques that can be used to distinguish benign breast cancers from malignant tumors that will go on to infiltrate other organs. Fine-needle aspiration cytology (FNAC) and mammography are two well-known and extremely common procedures that are used to diagnose breast cancers, but both of these suffer from a lack of satisfying diagnostic performance. For example, in using mammography technique, the doctors to look for the symptom of breast cancer uses an X-ray image of the breast. However, when interpreting the mammography the doctors’ decision may vary and the mammography also suffers from limitations such as false-negative results, false-positive results, etc. [5]. For the FNAC, a pathologist, radiologist and oncologist together to make a final decision in breast cancer diagnosis. It is possible to make errors due to fatigue or inexperience and it is also time consuming. Therefore, developing techniques to allow for intelligent automated prediction of breast cancer disease pathways would be of great benefit to the medical field. Data a mining and machine learning based intelligent automated prediction systems could improve the cancer diagnosis capability and reduce the diagnosis errors. Moreover, these systems can provides decision support for the doctors for an opportunity of early identification of breast cancer.
Data mining is a process which utilizes available data to find hidden, useful information that may not be directly recognizable [6]. It is a technique that has successfully been implemented in predicting outcomes related to liver disease [7], [8], heart disease [9], [10], [11], Parkinson disease [12], Cerebral palsy [13], Epileptic seizure [14], as well as other types of cancers, including lung [15], oropharyngeal and thyroid cancers [16], [17]. For breast cancer in particular, a number of data mining and machine learning techniques have already been applied in order to develop automated diagnosis models. Naïve Bayes, BayesNet, logistic regression, decision tree, K-Nearest Neighbor, neural networks, AdaBoost algorithms, Support Vector Machine (SVM) [18], [19], [20] are widely applied for breast cancer detection by finding patterns in input data according to given classes. These models are limited in that they have a fixed loop, which does not allow for further shaping and accuracy of the algorithm. In this study, we are proposing a new technique of data mining and machine learning that will allow for increased accuracy and therefore more accurate prediction of outcomes.
In this study, we have proposed a new nested ensemble (NE) technique and used this new approach to create an accurate, automatic prediction model which can detect the benign breast tumors from malignant ones. The objective of these predictions is to classify patients into benign and malignant categories, thereby allowing those with benign breast cancers to avoid or minimize the extent of invasive procedures they will have to undergo. The proposed approach allows us to apply several ensemble methods in same time to improve the performance of the prediction system.
The rest of our work is organized as follows: in Section 2, we briefly introduce some related work. In Section 3, we present our proposed nested ensemble methods. We will then introduce our experiments on WDBC dataset in Section 4. Section 5 shows the experimental results and discussion about the obtained outcomes. Finally, in Section 6 we conclude the work and provide some future works.
Section snippets
Data mining applications in breast cancer
The classification is one of most important supervised data analysis techniques. A number of classification algorithms such as neural networks, AdaBoost algorithms, Support Vector Machine (SVM), KNN and K*Tree and feature selection methods [21], [22], [23], [24] have been applied in variety of research fields [25], [26], [27], [28], [29]. In this section, we briefly reviewed some breast cancer applications by using a wide variety of data mining algorithms which can be effectively used to
Proposed nested ensemble model
This research introduces a new approach to ensemble classifiers and this new method is called nested ensemble (NE) method. In this section, we first discuss the formal definition of the research problem and then we present nested ensemble (NE) algorithm.
Let be a set of datasets, be a set of different algorithms and be a set of different ensemble learning techniques. We propose a new model that can combine two or more ensemble learning
Dataset
The breast cancer Wisconsin diagnostic (WDBC) dataset was used in our experiments. This dataset is obtained from University of California, Irvine (UCI) machine learning repository [55]. It includes 32 tumor features of 569 subjects. The 32 features are comprised by 30 actual tumor features, a subject ID number and a class label that indicates each subject has benign or malignant tumor. In this data set, 10 real-valued factors are evaluated for each cell nucleus which are displayed in Table 2.
Evaluation matrix
A
Results without ensemble technique
In this section, the experiment concentrates on evaluating the prediction performance of BayesNet and Naïve Bayes classifiers. The experimental outcomes obtained for the WBCD dataset are presented in Table 3. It can be seen from Table 3 that the BayesNet algorithm had a better performance compared with the Naïve Bayes algorithm. The highest accuracy for BayesNet algorithm is 95.25% when whereas the best accuracy for Naïve Bayes algorithm is 93.32% when .
Nested ensemble with 2-MetaClassifier
We reported the performances of
Conclusion
Timely and accurate diagnosis of different diseases is a main challenge in the healthcare research area. Breast cancer is one of the major causes of female deaths all over the world, leading to significant health interest in this domain. Significantly, this paper has introduced a new hybrid ensemble technique in order to improve the classification algorithms for early diagnosis of breast cancer. In this regard, the study evaluated the performance of hybrid ensemble methods using various K-fold
Acknowledgments
This paper is partially supported by the Commonwealth Innovation Connections Grant, Australia (no. RC54960).
References (69)
- et al.
Optimal breast cancer classification using Gauss–Newton representation based algorithm
Expert Syst. Appl.
(2017) Novel genetic ensembles of classifiers applied to myocardium dysfunction recognition based on ecg signals
Swarm Evol. Comput.
(2018)Novel methodology of cardiac health recognition based on ecg signals and evolutionary-neural system
Expert Syst. Appl.
(2018)- et al.
Gait classification in children with cerebral palsy by bayesian approach
Pattern Recognit.
(2009) - et al.
Data mining identifies the base of the heart as a dose-sensitive region affecting survival in lung cancer patients
Int. J. Radiat. Oncol. Biol. Phys.
(2016) - et al.
Breast cancer diagnosis based on feature extraction using a hybrid of k-means and support vector machine algorithms
Expert Syst. Appl.
(2014) - et al.
Associations between exposure to and expression of negative opinions about human papillomavirus vaccines on social media: an observational study
J. Med. Internet Res.
(2015) - et al.
Citations alone were enough to predict favorable conclusions in reviews of neuraminidase inhibitors
J. Clin. Epidemiol.
(2015) - et al.
Sentiment analysis for depression detection on social networks
International Conference on Advanced Data Mining and Applications
(2016) - et al.
Supervised fuzzy clustering for the identification of fuzzy classifiers
Pattern Recognit. Lett.
(2003)
An interpretable fuzzy rule-based classification methodology for medical diagnosis
Artif. Intell. Med.
Artificial immune system classification of multiple-class problems
In Proc. of Intelligent Engineering Systems
Large margin classifiers based on affine hulls
Neurocomputing
Ensemble-based classifiers
Artif. Intell. Rev.
An empirical comparison of voting classification algorithms: bagging, boosting, and variants
Mach. Learn.
Solving large scale linear prediction problems using stochastic gradient descent algorithms
Proceedings of the Twenty-first International Conference on Machine Learning
Logistic model trees
Mach. Learn.
Differential evolution based nearest prototype classifier with optimized distance measures for the features in the data sets
Expert Syst. Appl.
Statistical computation of feature weighting schemes through data estimation for nearest neighbor classifiers
Pattern Recognit.
A weighted inference engine based on interval-valued fuzzy relational theory
Expert Syst. Appl.
Scaled radial axes for interactive visual feature selection: a case study for analyzing chronic conditions
Expert Syst. Appl.
Context-based probability neural network classifiers realized by genetic optimization for medical decision making
Multimedia Tools Appl.
Local and global structure preservation for robust unsupervised spectral feature selection
IEEE Trans. Knowl. Data Eng.
Breast cancer diagnosis using ga feature selection and rotation forest
Neural Comput. Appl.
Modified bat algorithm for feature selection with the wisconsin diagnosis breast cancer (WDBC) dataset
Asian Pacific J. Cancer Prev.
Variability in radiologists’ interpretations of mammograms
N. Engl. J. Med.
Data Mining: Concepts, Models, Methods, and Algorithms
Rule optimization of boosted c5. 0 classification using genetic algorithm for liver disease prediction
Computer and Applications (ICCA), 2017 International Conference on
Improving the diagnosis of liver disease using multilayer perceptron neural network and boosted decision trees
J. Med. Biol. Eng.
Using PSO algorithm for producing best rules in diagnosis of heart disease
Computer and Applications (ICCA), 2017 International Conference on
Impact of patients gender on parkinsons disease using classification algorithms
J. AI Data Mining
A computer aided analysis scheme for detecting epileptic seizure from eeg data
Int. J. Comput. Intell. Syst.
Intelligent classification of lung & oral cancer through diverse data mining algorithms
Micro-Electronics and Telecommunication Engineering (ICMETE), 2016 International Conference on
Cited by (140)
An efficient deep learning scheme to detect breast cancer using mammogram and ultrasound breast images
2024, Biomedical Signal Processing and ControlA novel enhanced hybrid clinical decision support system for accurate breast cancer prediction
2023, Measurement: Journal of the International Measurement ConfederationAn innovative model fusion algorithm to improve the recall rate of peer-to-peer lending default customers
2023, Intelligent Systems with ApplicationsA magnification-independent method for breast cancer classification using transfer learning
2023, Healthcare Analytics
- ☆
I, Xujuan, hereby confirm on behalf of all authors that there are no known conflicts of interest associated with this publication and there has been no significant financial support for this work that could have influenced its outcome.