Vol. 1 No. 5 (2025): Desember
Articles

OVERCOMING BIAS FROM MISSING VALUES IN MICROCREDIT DATA BY COMPARING MICE AND IMPUTATION METHODS

Khalisha Alya Putri
UIN Raden Fatah Palembang
Nadhin Mutiara Hervani
UIN Raden Fatah Palembang
Istiqomah
UIN Raden Fatah Palembang
Fenny Purwani
UIN Raden Fatah Palembang

Published 2025-12-23

Keywords

  • Missing Values, Simple Imputation, MICE, Microcredit, Classification, AUC, Accuracy.

Abstract

Missing values are one of the main problems in financial data processing, especially in microcredit data. The presence of incomplete data can cause bias, disrupt variable distribution, and reduce the performance of classification models in determining creditworthiness. This study aims to compare two imputation approaches, namely Simple Imputation (median for numerical attributes and mode for categorical attributes) and the MICE (Multiple Imputation by Chained Equations) Method, in an effort to reduce bias due to missing values and improve classification prediction performance. The dataset used is Loan Payments Data from Kaggle, which contains 500 rows of data and 11 attributes, namely Loan_ID, loan_status, Principal, terms, effective_date, due_date, paid_off_time, past_due_days, age, education, and Gender. After the data cleaning process, outlier handling, and imputation using both methods, the data was predicted using two classification models, namely Logistic Regression and Random Forest. Model performance was evaluated using the Accuracy and AUC (Area Under the ROC Curve) metrics. The results showed that the MICE method produced higher and more stable performance compared to Simple Imputation. Logistic Regression increased from an accuracy of 66.67% to 82.00%, and AUC from 71.02% to 95.03%. The Random Forest model on Simple Imputation data achieved 100% accuracy and 100% AUC, but these overly perfect values potentially indicate overfitting, a condition where the model memorises specific patterns in the training data and is less able to generalise. On the MICE imputation data, Random Forest still achieved high performance with an accuracy of 98.00% and an AUC of 99.55%, which is considered more realistic and stable. These findings indicate that the MICE method is more effective in reducing bias due to missing values and improving the reliability of microcredit risk classification results.

References

  1. Widyananda, W., Purnomo, M. F. E., Aswin, M., Mudjirahardjo, P., &
  2. Pramono, S. H. (2023). Application of data mining and imputation algorithms for missing value handling: A study case car evaluation dataset. Iraqi Journal of Science, 64(5), 2481–2491. https://doi.org/10.24996/ijs.2023.64.5.32
  3. Sharifnia, A. M., Kpormegbey, D. E., Thapa, D. K., & Cleary, M. (2025).
  4. A primer of data cleaning in quantitative research: Handling missing values and outliers. Journal of Advanced Nursing. https://doi.org/10.1111/jan.16908
  5. Lesmana, R. A., Budiman, S. N., Shodiqi, A. A., Nadhila, J. K., Aziz, M.
  6. F. N., & Faizal, A. I. (2025). Implementasi algoritma decision tree-ID3 untuk prediksi kelayakan kredit berbasis web dengan menggunakan Next.js di KSU Syariah Muhammadiyah.
  7. Law, M. T., et al. (2019). Machine learning in secondary progressive
  8. multiple sclerosis: An improved predictive model for short-term disability progression. Multiple Sclerosis Journal – Experimental, Translational and Clinical, 5(4). https://doi.org/10.1177/2055217319885983
  9. Putra, D. F., et al. (2025). Evaluasi dampak kredit mikro terhadap konsumsi rumah tangga penerima kredit mikro di Indonesia.
  10. Amaliah, H., Taan, H., & Artikel, R. (2025). Analisis kelayakan pemberian dana kredit usaha rakyat (KUR) dalam mengantisipasi terjadinya kredit bermasalah pada perbankan. Jambura Accounting Review, 6(1), 261–270.
  11. Haganawiga, K. J., Pal, S. K., & Sirohi, A. (2025). A choice of performance metrics for evaluating predictive accuracy of survival models.