Elsevier

Knowledge-Based Systems

Volume 109, 1 October 2016, Pages 187-197
Knowledge-Based Systems

Coronary artery disease detection using computational intelligence methods

https://doi.org/10.1016/j.knosys.2016.07.004Get rights and content

Abstract

Nowadays, cardiovascular diseases are very common and are one of the main causes of death worldwide. One major type of such diseases is the coronary artery disease (CAD). The best and most accurate method for the diagnosis of CAD is angiography, which has significant complications and costs. Researchers are, therefore, seeking novel modalities for CAD diagnosis via data mining methods. To that end, several algorithms and datasets have been developed. However, a few studies have considered the stenosis of each major coronary artery separately. We attempted to achieve a high rate of accuracy in the diagnosis of the stenosis of each major coronary artery. Analytical methods were used to investigate the importance of features on artery stenosis. Further, a proposed classification model was built to predict each artery status in new visitors. To further enhance the models, a proposed feature selection method was employed to select more discriminative feature subsets for each artery. According to the experiments, accuracy rates of 86.14%, 83.17%, and 83.50% were achieved for the diagnosis of the stenosis of the left anterior descending (LAD) artery, left circumflex (LCX) artery and right coronary artery (RCA), respectively. To the best of our knowledge, these are the highest accuracy rates that have been obtained in the literature so far. In addition, a number of rules with high confidence were introduced for deciding whether the arteries were stenotic or not. Also, we applied the proposed method on two challenging datasets and obtained the best accuracy in comparison with other methods.

Introduction

Data mining methods, which discover relations hidden in a dataset, are utilized in different fields, from banking to insurance. Classification is one of the methods creating a model based on a set of labeled data, in order to assign labels to a set of unlabeled data records [1].

Today, using different machine learning methods is common in disease diagnosis [2]. Some notable machine learning methods are: Decision Tree [3], Neural Networks, Bayesian Networks [4], and Support Vector Machine (SVM) [5], [6], [7].

The search for the causes of heart disease and accurate diagnosis with fewer complications and higher accuracy is still ongoing, using machine learning and data mining techniques [8], [9], [10], [11], [12]. Angiography is currently deemed the most accurate method for the diagnosis of coronary artery disease (CAD) [13], [14]. However, the invasive nature of this diagnostic modality has prompted researchers to seek less invasive methods with the aid of data mining. A patient has CAD if at least one of LAD, LCX or RCA arteries is stenotic more than 50%. Coronary arteries supply blood to the heart muscles. The two main coronary arteries are the left and right coronary arteries. The left coronary artery is an artery that arises from the aorta above the left cusp of the aortic valve and feeds blood to the left side of the heart. It typically runs for 10–25 mm and is then bifurcated into the left anterior descending (LAD) and the left circumflex artery (LCX). The right coronary artery (RCA) is an artery originating above the right cusp of the aortic valve. It travels down the right atrioventricular groove, towards the crux of the heart [15].

Data mining techniques have the capability to evaluate factors contributing to cardiac disease with high accuracy rates. The literature contains several studies on the diagnosis of CAD. Polat et al. [16] utilized clinical information, the Artificial Immune Recognition System (AIRS), and the K Nearest Neighbor (KNN) to present a system for CAD diagnosis and attained an accuracy rate of 87%. Kara et al. [17] opted for the Doppler Signal and the Neural Network to achieve optimum diagnostic accuracy for CAD. Babagolu et al. [18] employed the exercise test data and the Support Vector Machine (SVM) and achieved 81.46% accuracy for the diagnosis of CAD. Das et al. [19] used the Cleveland dataset [20], achieving 89.01% accuracy for CAD diagnosis by means of several Neural Networks. Different feature selection methods such as CBA program [21], filter method [22], genetic algorithm [23], wrapper method [24], and numerical and nominal attribute selection [25] have been used for artery stenosis disease prediction. In [26], CAD is diagnosed using a new feature creation method.

In [27], a computer aided diagnostic technique based on several gray scale features extracted from echocardiography images from normal and CAD subjects was proposed. They achieved the accuracy, sensitivity, and specificity of 100% for CAD detection using Gaussian Mixture Model classifier. A multitude of grayscale features were extracted from echocardiography images belonging to a database of 400 normal cases and 400 CAD patients. In [28], HR signals obtained from ECG data recorded from normal and CAD subjects were analyzed. They analyzed both normal and CAD heart rate signals in time, frequency and non-linear domain. Their results showed that HR signals were less variable in CAD subjects, as compared to the normal subjects. The data included a total of 61 normal and 82 CAD subjects. In [29], ten features from full time series heart rate data were extracted. Using Gaussian Mixture Model classifier, they could differentiate between the two classes with clinically significant classification accuracy of 96.8%, sensitivity of 100%, and specificity of 93.7%.

A novel method based on tunable Q wavelet transform and correntropy to detect CAD subjects using heart rate signals has been presented in [30]. They obtained the average classification accuracy of 99.7%, sensitivity of 99.6% and the specificity of 99.8%. The ECG signals were obtained from Iqraa Hospital, Calicut, Kerela, India. 143 ECG files obtained contained 61 files from normal subjects and 82 files from CAD patients. [31] used the dataset with 23 features obtained from patients who had performed exercise stress testing and coronary angiography. They achieved the accuracy rate of 81.46% using the binary particles swarm optimization as feature selection models for the determination of coronary artery disease existence based upon exercise stress testing data. In [32], ECG stress signals for the diagnosis of coronary artery disease was used. They used combined uncertainty for EGG signal representation. Combined uncertainty computes a composite of two types of uncertainties, fuzzy and probabilistic. This type of modeling has shown success in extracting SSI feature values, which could be the first step in the automatic classification of stress ECG signals. In the best case, correct classification percentage was 80%.

Lee et al. [33] proposed multi-parametric features of Heart Rate Variability from ECG biosignal. Several supervised methods including Naïve Bayesian, decision tree (C4.5), associative classifier and SVM were used. In the experimental results, SVM had the best performance among others with the accuracy of 90.9%. Dataset included 99 patients with CAD and 94 patients with normal coronary arteries.

Kim et al. [34] used various linear and nonlinear measures of heart rate variability to develop the multi-parametric measure of heart rate variability diagnosing cardiovascular disease. Twenty control subjects, 51 patients with angina pectoris and 13 patients with acute coronary syndrome, participated in this study. The hit rate for control, angina pectoris, and acute coronary syndrome groups was 75.0%, 72.5%, and 84.6%, respectively. Totally, the hit rate was 75.0%, i.e. 63 cases among 84 original grouped cases were classified correctly. Zhao et al. [35] proposed an intelligent diagnosis system to diagnose CAD using Empirical Mode Decomposition–Teager Energy Operator to estimate the instantaneous frequency of diastolic murmurs. They also used Back-Propagatioin neural network to classify the murmurs. They tested their method on a dataset containing 40 cases, 20 normal and 20 CAD. The diagnosis rate was over 85%.

As mentioned before, a patient has CAD if at least one of LAD, LCX or RCA arteries is stenotic more than 50%. Although all of the above mentioned studies have diagnosed CAD, they do not detect which arteries are stenosed. As far as we know, the existing literature contains only two studies addressing the diagnosis of the stenosis of the LAD, LCX and RCA through data mining methods [36], [37].

The present study employs the Support Vector Machine (SVM) on the Z-Alizadeh Sani dataset [26] and discusses the impact of the features on the stenosis of each of the LAD, LCX and RCA.

We have used two approaches to select features: 1- The feature selection method selects (possibly) different features for each artery separately. We have applied weights by SVM [6] to select the features. The weights by SVM method are shown to be effective for predicting CAD [26]. 2- A feature selection method selects the same features for all arteries. For this purpose, two methods are used: Average information gain and combined information gain.

Then, the classification algorithms are tested on the selected sets of features using 10-fold cross validation and the results are compared with each other. Finally, association rule mining algorithms (Apriori algorithm) are used to extract rules for the stenosis of coronary arteries.

The rest of this paper is organized as follows: In section II, the used medical dataset is introduced. Section III describes the exploited proposed methods and section IV presents the experimental evaluations of the methods and the results. Section V is discussion of related work. Finally, section VI concludes the paper.

Section snippets

Medical dataset used

The Z-Alizadeh Sani dataset was collected from a random sample of 303 patients with 54 features. We have introduced this dataset in our previous works [26]. It was collected for CAD diagnosis. The features, along with their valid ranges, are presented in Table 1. The selected features comprised the most important and relevant findings from the patients’ medical histories, physical examinations, laboratory data, ECGs, and echocardiograms.

Some of the features in the Table 1 should be further

Method

In this section, the data mining methods employed to analyze the dataset are discussed.

Experimental results

In this section, we present the experimental results obtained by the proposed method. Our experiments were performed on a PC with 3.30 GHz Intel Core i3 CPU and 4 GB RAM, using Windows 7 operating system. The RapidMiner tool was used for defining the Information gain of the dataset features. RapidMiner is an environment for machine learning, data mining, text mining, and business analytics. It is used for research, education, training, and industrial applications [52]. In this study, version

Discussion of related work

As mentioned in section I, there are some researches on CAD diagnosis. They have shown that there is stenosis in at least one of LAD, LCX and RCA arteries, but they have not diagnosed which artery is blocked. In medicine, it is so important to know which arteries have problem.

As far as we know, only two researches have been done on diagnosing which artery is stenosed. Babaoglu et al. [36] used the Neural Network algorithm to assess data obtained from the exercise test. The data was studied by

Conclusion and future work

In this paper, two feature selection methods were proposed: average information gain and combined information gain. Each method selected 24 features. The results of the classification algorithms showed that these two selection methods led to better accuracy than those selecting separate sets of features for each artery.

The proposed method is an extension of SVM. At first, sample distance from separating hyperplane is measured. Unlike SVM, we not only decide based on sign of record but also

References (60)

  • R. Das et al.

    Effective diagnosis of heart disease through neural networks ensembles

    Exp. Syst. Appl.

    (Mar. 2009)
  • K. Polat et al.

    A hybrid approach to medical decision support systems: combining feature selection, fuzzy weighted pre-processing and AIRS

    Comput. Methods Prog. Bio.

    (Apr. 2007)
  • R. Alizadehsani

    A data mining approach for diagnosis of coronary artery disease

    Comput. Methods Prog. Biomed.

    (Jul. 2013)
  • U.R. Acharya et al.

    Automated classification of patients with coronary artery disease using grayscale features from left ventricle echocardiographic images

    Comput. Methods Prog. Biomed.

    (Jul. 2013)
  • U.R. Acharya et al.

    Linear and nonlinear analysis of normal and CAD-affected heart rate signals

    Comput. Methods Prog. Biomed.

    (Aug. 2014)
  • D. Giri et al.

    Automated diagnosis of coronary artery disease affected patients using LDA, PCA, ICA and Discrete Wavelet Transform

    Knowl.-Based Syst.

    (Oct. 2013)
  • I. Babaoglu et al.

    A comparison of feature selection models utilizing binary particle swarm optimization and genetic algorithm in determining coronary artery disease using support vector machine

    Exp. Syst. Appl.

    (Apr. 2010)
  • I. Babaoglu

    Assessment of exercise stress testing with artificial neural network in determining coronary artery disease and predicting lesion localization

    Exp. Syst. Appl.

    (Feb. 2009)
  • I. Alberto et al.

    A comparative study of variation operators used for evolutionary multi-objective optimization

    Inf. Sci.

    (Jan. 2014)
  • W.K.J. Assuncao et al.

    A multi-objective optimization approach for the integration and test order problem

    Inf. Sci.

    (Sep. 2014)
  • E. Baralis et al.

    Generalized association rule mining with constraints

    Inf. Sci.

    (Feb. 2012)
  • P.Y. Hsu et al.

    Algorithms for mining association rules in bag databases

    Inf. Sci.

    (Sep. 2004)
  • WangR. et al.

    A vector-valued support vector machine model for multiclass problem

    Inf. Sci.

    (Jan. 2013)
  • ZhangJ. et al.

    A rough margin based support vector machine

    Inf. Sci.

    (Mar. 2008)
  • A.F. Atiya et al.

    A penalized likelihood based pattern classification algorithm

    Patt. Recog.

    (Jan. 2009)
  • P.N. Tan et al.

    Introduction to data mining

    (2006)
  • M.A. Karaolis et al.

    Assessment of the risk factors of coronary heart events based on data mining with decision trees

    IEEE Trans. Inform. Technol. Biomed.

    (Jan. 2010)
  • K.J. Cios

    Data mining methods for knowledge discovery

    IEEE Trans. Neural Netw.

    (Nov. 1998)
  • A. Navia-Vazquez et al.

    Distributed support vector machines

    IEEE Trans. Neural Netw.

    (Jul. 2006)
  • V.S.H. Rao et al.

    Novel approaches for predicting risk factors of atherosclerosis

    IEEE J. Biomed. Health Inform.

    (Nov. 2012)
  • Cited by (112)

    • A literature embedding model for cardiovascular disease prediction using risk factors, symptoms, and genotype information

      2023, Expert Systems with Applications
      Citation Excerpt :

      Since good performance of CVD classification and prediction is predicated on the significance of the selected features, it is paramount to use appropriate feature selection methods. Some of the noteworthy methods are: decision tree (DT), random forest (RF), weights by support vector machine (SVM), cost-sensitive algorithm, hybrid neural network-genetic algorithm, and heterogeneous hybrid feature selection (2HFS) algorithm (Alizadehsani et al., 2012, 2013, 2016, 2018; Ambale-Venkatesh et al., 2017; Arabasadi et al., 2017; Nasarian et al., 2020). Additionally, since feature dimension reduction can result in better CVD classification and prediction, principal component analysis (PCA) and uniform manifold approximation and projection (UMAP) have also been widely used (Cheng et al., 2022; Spencer et al., 2020).

    View all citing articles on Scopus
    View full text