#9 – EFS- CC: Ensemble Feature Selection-using Correlation and Cross-Validation Score to Predict Heart Disease in the Healthcare Industry

H. Karthikeyan and T. Menakadevi. EFS- CC: Ensemble Feature Selection-using Correlation and Cross-Validation Score to Predict Heart Disease in the Healthcare Industry. Dynamic Systems and Applications 30 (2021) No.8, 1346 – 1361

https://doi.org/10.46719/dsa20213089

ABSTRACT.
Feature selection is a data preprocessing step in the applications ofArtificial Intelligence and Machine Learning. Feature selection is a process of finding an optimal subset of features from the extensive features in data analytics. In data analytics, theselected optimal features are given as input to the machine learning model to improve themodel’s prediction accuracy in the particular domain. To find out the optimal features in data analytics, we proposed an ensemble feature selection- correlation and cross-validation score model by using filter approach and Forward Feature Selection model with SVM(FFS-SVM). This model focuses on the correlation coefficient asa filter approach and the cross-validation score for FFS-SVM as metrics to find out optimal features. The proposed feature selection model has been experimentedwith by using python packages with a Heart disease dataset from the UCI repository. The proposed model produced 93.5 %, 90.0%, 95.24%, and 90.0% accuracy, sensitivity, specificity, and precision, respectively,in the heart disease prediction dataset. For the same dataset,we experimented with SVM, Random forest, and KNN by not considering any feature selection mechanism; from the experiments, we found that our proposed model has improved the accuracy of 13.13 %  in SVM, 16.86% in Random Forest, and 22.48% in KNN.

Keywords:  Feature Selection, Filter Methods, Wrapper Methods, Variable Selection, Information Gain.