Penerapan Metode eXtreme Gradient Boosting (XGBoost) pada Klasifikasi Kualitas Udara Indonesia Menggunakan Augmentasi Data Conditional Generative Adversarial Network (CGAN)
DOI:
https://doi.org/10.26714/uwc.v8.403-416.2025Kata Kunci:
CGAN, klasifikasi, kualitas udara, XGboostAbstrak
Latar belakang : Kualitas udara yang semakin menurun dari tahun ke tahun menjadi masalah penting bagi Indonesia, terutama karena dampaknya terhadap kesehatan masyarakat dan lingkungan. Salah satu contohnya adalah kasus ISPA yang semakin meningkat dan pada tahun 2023 kasusnya mencapai 1,5 hingga 1,8 juta dengan salah satu faktor penyebabnya adalah kualitas udara yang buruk. Oleh karena itu, diperlukan deteksi dan pemantauan kualitas udara yang lebih efektif. Metode : Metode machine learning seperti eXtreme Gradient Boosting (XGBoost) telah banyak digunakan untuk mengklasifikasikan kualitas udara. Namun, ketidakseimbangan data antar kelas dapat menyebabkan model klasifikasi menjadi kurang optimal, karena model cenderung bias terhadap kelas mayoritas. Oleh karena itu, penelitian ini menggunakan Conditional Generative Adversarial Network (CGAN) sebagai teknik augmentasi data guna meningkatkan performa klasifikasi dengan menambah jumlah sampel pada kelas minoritas. Hasil penelitian : Berdasarkan hasil klasifikasi, proses augmentasi data CGAN mampu meningkatkan keseimbangan prediksi antar kelas, terutama dalam mengklasifikasikan kelas minoritas. Model CGAN-XGBoost menunjukkan performa yang baik dengan akurasi mencapai 97.08% serta nilai presisi 97.11%, recall 97.1%, F1-Score 97.08%, dan AUC 98.82%. Hal ini menunjukkan bahwa XGBoost merupakan metode yang stabil dan akurat dalam memanfaatkan data sintetis CGAN. Kesimpulan : Dengan demikian, CGAN-XGBoost dinilai efektif dalam menangani ketidakseimbangan data kualitas udara di Indonesia.
____________________________________________________________________
Abstract
Background : The declining air quality year after year has become a significant problem for Indonesia, especially due to its impact on public health and the environment. One example is the increasing number of respiratory tract infection cases, which reached 1.5 to 1.8 million in 2023, with poor air quality being one of the contributing factors. Therefore, more effective air quality detection and monitoring are needed. Method : Machine learning methods such as eXtreme Gradient Boosting (XGBoost) have been widely used to classify air quality. However, data imbalance between classes can cause classification models to be less than optimal, as models tend to be biased towards the majority class. Therefore, this study uses Conditional Generative Adversarial Network (CGAN) as a data augmentation technique to improve classification performance by increasing the number of samples in the minority class. Result : Based on the classification results, the CGAN data augmentation process was able to improve the balance of predictions between classes, especially in classifying minority classes. The CGAN-XGBoost model showed good performance with an accuracy of 97.08%, precision of 97.11%, recall of 97.1%, F1-Score of 97.08%, and AUC of 98.82%. This shows that XGBoost is a stable and accurate method in utilizing CGAN synthetic data. Conclusion : Thus, CGAN-XGBoost is considered effective in handling the imbalance of air quality data in Indonesia.
Referensi
A. D. Izzati et al., Analisis Dampak Teknologi Modern Terhadap Masalah Lingkungan. Semarang: CV. Alinea Media Dipantara, 2022.
KLHK, “Uji Emisi dan Kendaraan Listrik Jadi Solusi Tekan Polusi,” Kementerian Lingkungan Hidup dan Kehutanan. Accessed: Feb. 21, 2025. [Online]. Available: https://ppid.menlhk.go.id/berita/siaran-pers/7311/uji-emisi-dan-kendaraan-listrik-jadi-solusi-tekan-polusi
A. K. Gorai, F. Tuluri, P. B. Tchounwou, and S. Ambinakudige, “Influence of local meteorology and NO2 conditions on ground-level ozone concentrations in the eastern part of Texas, USA,” Air Qual Atmos Health, vol. 8, no. 1, pp. 81–96, Feb. 2015, doi: 10.1007/s11869-014-0276-5.
HEI, “New State of Global Air Report Finds Air Pollution is Second Leading Risk Factor for Death Worldwide,” Health Effects Institute.
UNICEF, “Air pollution accounted for 8.1 million deaths globally in 2021, becoming the second leading risk factor for death, including for children under five years,” UNICEF for every child, Jun. 18, 2024. Accessed: Nov. 16, 2024. [Online]. Available: https://www.unicef.org/press-releases/air-pollution-accounted-81-million-deaths-globally-2021-becoming-second-leading-risk
Rokom, “Polusi Ancam Saluran Pernapasan,” Kementrian Kesehatan RI. Accessed: Nov. 24, 2024. [Online]. Available: https://sehatnegeriku.kemkes.go.id/baca/blog/20240108/5644635/polusi-ancam-saluran-pernapasan/
IQAir, “Air quality in Indonesia,” IQAir. Accessed: Nov. 16, 2024. [Online]. Available: https://www.iqair.com/indonesia
I. M. Nur, D. Rosadi, and Abdurakhman, “Multi-Class Imbalance Classification of Diabetes Cases Using Light Gradient Boosting Machine,” ITM Web of Conferences, vol. 67, p. 01012, 2024, doi: 10.1051/itmconf/20246701012.
Sarmini, Sunardi, and A. Fadlil, “Performa Random Forest dan XGBoost pada Deteksi Penipuan E-Commerce Menggunakan Augmentasi Data CGAN,” Technology and Science (BITS), vol. 6, no. 3, 2024, doi: 10.47065/bits.v6i3.6430.
A. F. B. Sajiwo, B. Rahmat, and A. Junaidi, “KLASIFIKASI INDEKS STANDAR PENCEMARAN UDARAN (ISPU) MENGGUNAKAN ALGORITMA XGBOOST DENGAN TEKNIK IMBALANCED DATA (SMOTE),” Jurnal Informatika dan Teknik Elektro Terapan, vol. 12, no. 3, Aug. 2024, doi: 10.23960/jitet.v12i3.4699.
A. M. Sapari, A. Id Hadiana, and F. R. Umbara, “Air Quality Classification Using Extreme Gradient Boosting (XGBOOST) Algorithm ARTICLE INFORMATION ABSTRACT,” 2023. [Online]. Available: http://innovatics.unsil.ac.id
F. P. Arifianti and A. Salam, “XGBoost and Random Forest Optimization using SMOTE to Classify Air Quality,” Advance Sustainable Science, Engineering and Technology (ASSET), vol. 6, no. 1, Nov. 2024, doi: 10.26877/asset.v6i1.17951.
J. Liu, K. Xu, B. Cai, and Z. Guo, “Fault Prediction of On-Board Train Control Equipment Using a CGAN-Enhanced XGBoost Method with Unbalanced Samples,” Machines, vol. 11, no. 1, Jan. 2023, doi: 10.3390/machines11010114.
T. Toharudin et al., “Boosting Algorithm to Handle Unbalanced Classification of PM2.5Concentration Levels by Observing Meteorological Parameters in Jakarta-Indonesia Using AdaBoost, XGBoost, CatBoost, and LightGBM,” IEEE Access, vol. 11, pp. 35680–35696, 2023, doi: 10.1109/ACCESS.2023.3265019.
K. Erwansyah, B. Andika, and R. Gunawan, “J-SISKO TECH Jurnal Teknologi Sistem Informasi dan Sistem Komputer TGD Implementasi Data Mining Menggunakan Asosiasi Dengan Algoritma Apriori Untuk Mendapatkan Pola Rekomendasi Belanja Produk Pada Toko Avis Mobile,” , vol. 148, no. 1, pp. 148–161, 2021.
R. Ahsan, W. Shi, X. Ma, and W. Lee Croft, “A comparative analysis of CGAN-based oversampling for anomaly detection,” IET Cyber-Physical Systems: Theory and Applications, vol. 7, no. 1, pp. 40–50, Mar. 2022, doi: 10.1049/cps2.12019.
J. Kaliappan, A. R. Bagepalli, S. Almal, R. Mishra, Y. C. Hu, and K. Srinivasan, “Impact of Cross-Validation on Machine Learning Models for Early Detection of Intrauterine Fetal Demise,” Diagnostics, vol. 13, no. 10, May 2023, doi: 10.3390/diagnostics13101692.
S. George and B. Sumathi, “Grid Search Tuning of Hyperparameters in Random Forest Classifier for Customer Feedback Sentiment Prediction,” 2020. [Online]. Available: www.ijacsa.thesai.org
A. Sutou and J. Wang, “Influence-Balanced XGBoost: Improving XGBoost for Imbalanced Data Using Influence Functions,” IEEE Access, vol. 12, pp. 193473–193486, 2024, doi: 10.1109/ACCESS.2024.3520159.


