Optimasi Random Forest untuk Prediksi Penyakit Jantung Menggunakan SMOTEENN dan Grid Search
DOI:
https://doi.org/10.52436/1.jpti.855Kata Kunci:
Penyakit Jantung, Random Forest, SMOTEENN, GridSearchCV, klasifikasi medisAbstrak
Penyakit jantung merupakan salah satu penyebab utama kematian di dunia, dengan sekitar 17,9 juta kematian setiap tahun. Diagnosis dini dan akurat sangat penting untuk pengobatan yang efektif, namun ketidakseimbangan kelas dalam dataset medis sering menyebabkan bias pada model prediktif, khususnya dalam mengidentifikasi pasien dengan penyakit jantung (kelas minoritas). Studi ini bertujuan untuk mengoptimalkan kinerja algoritma Random Forest dalam memprediksi penyakit jantung dengan mengatasi ketidakseimbangan data menggunakan teknik SMOTEENN (Synthetic Minority Over-sampling Technique combined with Edited Nearest Neighbors) serta penyetelan hiperparameter melalui GridSearchCV. Dataset dibagi menjadi data pelatihan (80%) dan pengujian (20%), dengan evaluasi kinerja menggunakan metrik akurasi, presisi, recall, spesifisitas, F1-score, dan AUC ROC. Hasil penelitian menunjukkan bahwa model yang dioptimalkan mencapai akurasi sebesar 94%, presisi 87%, recall 100%, spesifisitas 91%, F1-score 93%, dan AUC sebesar 0,99. Teknik SMOTEENN terbukti efektif dalam meningkatkan representasi kelas minoritas tanpa menimbulkan noise yang signifikan, sementara GridSearchCV berhasil menemukan kombinasi hiperparameter terbaik untuk meningkatkan performa model. Model Random Forest yang dihasilkan menunjukkan potensi tinggi sebagai alat bantu diagnosis dini penyakit jantung, yang dapat berkontribusi dalam menurunkan angka kematian dan meningkatkan efisiensi biaya perawatan.
Unduhan
Referensi
C. W. Tsao et al., “Heart Disease and Stroke Statistics—2023 Update: A Report From the American Heart Association,” Circulation, vol. 147, no. 8, Feb. 2023, doi: 10.1161/CIR.0000000000001123.
Donatila Mano S, Agnes Marcella, Yohanes Firmansyah, and Alexander Halim Santoso, “Peningkatan Pemahaman dan Kewaspadaan Masyarakat akan Penyakit Arteri Perifer,” Jurnal Kabar Masyarakat, vol. 1, no. 2, pp. 31–40, Jun. 2023, doi: 10.54066/jkb.v1i2.337.
S. A. T. Al Azhima, D. Darmawan, N. F. Arief Hakim, I. Kustiawan, M. Al Qibtiya, and N. S. Syafei, “Hybrid Machine Learning Model untuk memprediksi Penyakit Jantung dengan Metode Logistic Regression dan Random Forest,” Jurnal Teknologi Terpadu, vol. 8, no. 1, pp. 40–46, Jul. 2022, doi: 10.54914/jtt.v8i1.539.
A. M. A. Rahim, Inggrid Yanuar Risca Pratiwi, and Muhammad Ainul Fikri, “Klasifikasi Penyakit Jantung Menggunakan Metode Synthetic Minority Over-Sampling Technique Dan Random Forest Clasifier,” Indonesian Journal of Computer Science, vol. 12, no. 5, Nov. 2023, doi: 10.33022/ijcs.v12i5.3413.
D. H. Depari, Y. Widiastiwi, and M. M. Santoni, “Perbandingan Model Decision Tree, Naive Bayes dan Random Forest untuk Prediksi Klasifikasi Penyakit Jantung,” Informatik?: Jurnal Ilmu Komputer, vol. 18, no. 3, p. 239, Dec. 2022, doi: 10.52958/iftk.v18i3.4694.
E. Erlin, Y. Desnelita, N. Nasution, L. Suryati, and F. Zoromi, “Dampak SMOTE terhadap Kinerja Random Forest Classifier berdasarkan Data Tidak seimbang,” MATRIK?: Jurnal Manajemen, Teknik Informatika dan Rekayasa Komputer, vol. 21, no. 3, pp. 677–690, Jul. 2022, doi: 10.30812/matrik.v21i3.1726.
S. P. Tamba and E. -, “PREDIKSI PENYAKIT GAGAL JANTUNG DENGAN MENGGUNAKAN RANDOM FOREST,” Jurnal Sistem Informasi dan Ilmu Komputer Prima(JUSIKOM PRIMA), vol. 5, no. 2, pp. 176–181, Mar. 2022, doi: 10.34012/jurnalsisteminformasidanilmukomputer.v5i2.2445.
A. A. G. W. S. Erlangga, I. G. A. Gunadi, and I. M. G. Sunarya, “Kombinasi Oversampling dan Undersampling dalam Menangani Class Imbalanced dan Overlapping pada Klasifikasi Data Bank Marketing,” Jurnal RESISTOR (Rekayasa Sistem Komputer), vol. 7, no. 1, pp. 32–42, Apr. 2024, doi: 10.31598/jurnalresistor.v7i1.1515.
G. Husain et al., “SMOTE vs. SMOTEENN: A Study on the Performance of Resampling Algorithms for Addressing Class Imbalance in Regression Models,” Algorithms, vol. 18, no. 1, p. 37, Jan. 2025, doi: 10.3390/a18010037.
D. P. Mishra, H. K. Gupta, G. Saajith, and R. Bag, “Optimizing Heart Disease Prediction Model with GridsearchCV for Hyperparameter Tuning,” in 2024 1st International Conference on Cognitive, Green and Ubiquitous Computing (IC-CGU), IEEE, Mar. 2024, pp. 1–6. doi: 10.1109/IC-CGU58078.2024.10530772.
M. G. El-Shafiey, A. Hagag, E.-S. A. El-Dahshan, and M. A. Ismail, “A hybrid GA and PSO optimized approach for heart-disease prediction based on random forest,” Multimed Tools Appl, vol. 81, no. 13, pp. 18155–18179, May 2022, doi: 10.1007/s11042-022-12425-x.
F. Handayani, “Komparasi Support Vector Machine, Logistic Regression Dan Artificial Neural Network Dalam Prediksi Penyakit Jantung,” Jurnal Edukasi dan Penelitian Informatika (JEPIN), vol. 7, no. 3, p. 329, Dec. 2021, doi: 10.26418/jp.v7i3.48053.
N. Alotaibi and M. Alzahrani, “Comparative Analysis of Machine Learning Algorithms and Data Mining Techniques for Predicting the Existence of Heart Disease,” International Journal of Advanced Computer Science and Applications, vol. 13, no. 7, 2022, doi: 10.14569/IJACSA.2022.0130794.
E. Mbunge et al., “Implementation of ensemble machine learning classifiers to predict diarrhoea with SMOTEENN, SMOTE, and SMOTETomek class imbalance approaches,” in 2023 Conference on Information Communications Technology and Society (ICTAS), IEEE, Mar. 2023, pp. 1–6. doi: 10.1109/ICTAS56421.2023.10082744.
Y. Han and I. Joe, “Enhancing Machine Learning Models Through PCA, SMOTE-ENN, and Stochastic Weighted Averaging,” Applied Sciences, vol. 14, no. 21, p. 9772, Oct. 2024, doi: 10.3390/app14219772.
Y. A. Sir and A. H. H. Soepranoto, “Pendekatan Resampling Data Untuk Menangani Masalah Ketidakseimbangan Kelas,” Jurnal Komputer dan Informatika, vol. 10, no. 1, pp. 31–38, Mar. 2022, doi: 10.35508/jicon.v10i1.6554.
R. Valarmathi and T. Sheela, “Heart disease prediction using hyper parameter optimization (HPO) tuning,” Biomed Signal Process Control, vol. 70, p. 103033, Sep. 2021, doi: 10.1016/j.bspc.2021.103033.
H. A. Al-Alshaikh et al., “Comprehensive evaluation and performance analysis of machine learning in heart disease prediction,” Sci Rep, vol. 14, no. 1, p. 7819, Apr. 2024, doi: 10.1038/s41598-024-58489-7.
A. Masruriyah, H. Novita, C. Sukmawati, A. Ramadhan, S. Arif, and B. Dermawan, “Pengukuran Kinerja Model Klasifikasi dengan Data Oversampling pada Algoritma Supervised Learning untuk Penyakit Jantung,” Computer Science (CO-SCIENCE), vol. 4, no. 1, pp. 62–70, Jan. 2024, doi: 10.31294/coscience.v4i1.2389.
M. Daviran, A. Maghsoudi, R. Ghezelbash, and B. Pradhan, “A new strategy for spatial predictive mapping of mineral prospectivity: Automated hyperparameter tuning of random forest approach,” Comput Geosci, vol. 148, p. 104688, Mar. 2021, doi: 10.1016/j.cageo.2021.104688.
K.-V. Tompra, G. Papageorgiou, and C. Tjortjis, “Strategic Machine Learning Optimization for Cardiovascular Disease Prediction and High-Risk Patient Identification,” Algorithms, vol. 17, no. 5, p. 178, Apr. 2024, doi: 10.3390/a17050178.
J. Yang and J. Guan, “A Heart Disease Prediction Model Based on Feature Optimization and Smote-Xgboost Algorithm,” Information, vol. 13, no. 10, p. 475, Oct. 2022, doi: 10.3390/info13100475.
T. Gori and A. Hestiningtyas, “Optimasi Pemilihan Fitur untuk Prediksi Penyakit Jantung Menggunakan Algoritma Genetika dan Random Forest,” The Indonesian Journal of Computer Science, vol. 13, no. 5, Oct. 2024, doi: 10.33022/ijcs.v13i5.4214.
N. H. Alfajr and S. Defiyanti, “PREDIKSI PENYAKIT JANTUNG MENGGUNAKAN METODE RANDOM FOREST DAN PENERAPAN PRINCIPAL COMPONENT ANALYSIS (PCA),” Jurnal Informatika dan Teknik Elektro Terapan, vol. 12, no. 3S1, Oct. 2024, doi: 10.23960/jitet.v12i3S1.5055.
S. A. Reddy, S. K. G.A.E., B. M, and L. Mosangi, “Hybrid Machine Learning Approaches for Robust Heart Disease Prediction: A Comprehensive Analysis,” in 2024 6th International Conference on Electrical, Control and Instrumentation Engineering (ICECIE), IEEE, Nov. 2024, pp. 1–10. doi: 10.1109/ICECIE63774.2024.10815655.
N. Chandrasekhar and S. Peddakrishna, “Enhancing Heart Disease Prediction Accuracy through Machine Learning Techniques and Optimization,” Processes, vol. 11, no. 4, p. 1210, Apr. 2023, doi: 10.3390/pr11041210.
M. Ahmed, M. H. Sulaiman, M. M. Hassan, and T. Bhuiyan, “Predicting the Classification of Heart Failure Patients Using Optimized Machine Learning Algorithms,” IEEE Access, vol. 13, pp. 30555–30569, 2025, doi: 10.1109/ACCESS.2025.3541069.
K. Sumwiza, C. Twizere, G. Rushingabigwi, P. Bakunzibake, and P. Bamurigire, “Enhanced cardiovascular disease prediction model using random forest algorithm,” Inform Med Unlocked, vol. 41, p. 101316, 2023, doi: 10.1016/j.imu.2023.101316.
C. M. Bhatt, P. Patel, T. Ghetia, and P. L. Mazzeo, “Effective Heart Disease Prediction Using Machine Learning Techniques,” Algorithms, vol. 16, no. 2, p. 88, Feb. 2023, doi: 10.3390/a16020088.
J. Li, Q. Zhu, Q. Wu, and Z. Fan, “A novel oversampling technique for class-imbalanced learning based on SMOTE and natural neighbors,” Inf Sci (N Y), vol. 565, pp. 438–455, Jul. 2021, doi: 10.1016/j.ins.2021.03.041.
M. H. Jamal, N. Naz, M. A. K. Khattak, F. Saeed, S. N. Altamimi, and S. N. Qasem, “A Comparison of Re-Sampling Techniques for Detection of Multi-Step Attacks on Deep Learning Models,” IEEE Access, vol. 11, pp. 127446–127457, 2023, doi: 10.1109/ACCESS.2023.3332512.