Imbalanced Data Problem Solving in Classification of Diabetes Patients

Authors

  • Witwisit Kesornsit วิชญ์วิสิฐ เกษรสิทธิ์ นักศึกษา หลักสูตรวิทยาศาสตรมหาบัณฑิต สาขาสถิติ คณะสถิติประยุกต์ สถาบันบัณฑิตพัฒนบริหารศาสตร์
  • Dr.Vichit Lorchirachoonkul ดร.วิชิต หล่อจีระชุณห์กุล รองศาสตราจารย์ สาขาสถิติ คณะสถิติประยุกต์ สถาบันบัณฑิตพัฒนบริหารศาสตร์
  • Dr.Jirawan Jitthavech ดร.จิราวัลย์ จิตรถเวช ศาสตราจารย์ สาขาสถิติ คณะสถิติประยุกต์ สถาบันบัณฑิตพัฒนบริหารศาสตร์

Keywords:

Classification, Imbalanced data, Decision tree, Multinomial logistic regression

Abstract

Classification techniques when using imbalanced data is a challenging problem in the classification research area. The classification techniques of imbalanced data will cause the data in a majority class to have some features that obscure the characteristics of the minority class and make the classification performance of the minority class unacceptable. This research intends to compare the efficiency of solving the imbalanced data of diabetes patients using Data Level Solutions by 4 methods: Oversampling, Undersampling, Hybrid method and Synthetic Minority Oversampling TEchnique (SMOTE) in the classification using the multinomial logistic regression and decision tree techniques. By comparing the statistics and algorithms in the classification, it can be concluded that the classification by decision tree technique using SMOTE method to solve the imbalanced data by using decision tree technique yields the best result.

References

1. Ammaruekarat P, et al., A Comparative Efficiency of Feature Selection and Neural Network Classification, in The 5th National Conference on Computing and Information Technology. 2552. Thai.

2. Chomboon K. Classification Technique for Minority Class on Imbalanced Dataset with Data Partitioning Method [PhD Thesis], Nakhon Ratchasima: Suranaree University of Technology; 2015. Thai.

3. Chujai P. Ensemble Learning for Imbalanced Data Classification Problem [PhD Thesis], Nakhon Ratchasima: Suranaree University of Technology; 2014. Thai.

4. Starkweather J, Moske AK. Multinomial Logistic Regression. 2011 [cited 2017 27 March]; Available from: https://it.unt.edu/sites/default/files/mlr_jds_aug2011.pdf.

5. Anderson E, et al., Data Mining in Readmission Problem. Technical Report, 2014.

6. Strack B, et al., Impact of HbA1c measurement on hospital readmission rates: Analysis of 70,000 clinical database patient records. BioMed Research International, 2014: p. 1-11

7. Fawcett T. Learning from Imbalanced Classes. 2016 [cited 2017 15 April]; Available from: https://www.svds.com/learning-imbalanced-classes/?utm_source=kdnuggets&utm_medium=blog&utm_campaign=learning+from+imbalanced+classes.

8. Garbled. Class Imbalance Problem. 2013 [cited 2016 14 May]; Available from: https://www.chioka.in/class-imbalance-problem/.

9. Gazzah S, Hechkel A, Amara NEB. A hybrid sampling method for imbalanced data. in Systems, Signals & Devices (SSD). The 12th International Multi-Conference. 2015. IEEE.

10. Songwattanasiri P. Synthetic Minority Over-Sampling and Majority Under-Sampling Techniques for Class Imbalanced Problems [MSc Thesis], Bangkok: Chulalongkorn University; 2010. Thai.

11. Lu Y, Cheung Y, Tang Y. Hybrid Sampling with Bagging for Class Imbalance Learning. in Pacific-Asia Conference on Knowledge Discovery and Data Mining. 2016. Springer.

12. Guzman L. Data sampling improvement by developing SMOTE technique in SAS. SAS and all other SAS Institute Inc, 2015. 3483-2015.

13. He H, et al. ADASYN: Adaptive synthetic sampling approach for imbalanced learning. in Neural Networks, 2008. IJCNN 2008. (IEEE World Congress on Computational Intelligence). IEEE International Joint Conference. 2008. IEEE. p. 1322-1328.

14. Wang R, Lee N, Wei Y. A Case Study: Improve Classification of Rare Events with SAS® Enterprise Miner™. SAS and all other SAS Institute Inc, 2015. 3282: p. 1-12.

15. Damodaran R, et al., Predicting Rare Events Using Specialized Sampling Techniques in SAS®. SAS and all other SAS Institute Inc, 2015. 11140-2016.

16. Edell A. Understand these 5 basic concepts to sound like a machine learning expert. 2017 [cited 2017 20 June]; Available from: https://medium.com/towards-data-science/understand-these-5-basic-concepts-to-sound-like-a-machine-learning-expert-6221ec0fe960.

17. Rubin DJ. Hospital Readmission of Patients with Diabetes. Current Diabetes Reports, 2015. 15(4).

18. Dogan N, Tanrikulu Z. A comparative analysis of classification algorithms in data mining for accuracy, speed and robustness. Information Technology and Management, 2013. 14(2): p. 105-124.

19. Zhu, Q., Akkati, A. and Hongwattanakul, P. Risk feature assessment of readmission for diabetes. in 2016 IEEE International Conference on Bioinformatics and Biomedicine (BIBM). 2016. IEEE.

20. Soomhirun R, Monkong S, Khuwawatanasamrit K. A Literature Review Related to the Management for Reducing Readmission in Patients with Heart Failure. Thai Journal of Cardio-Thoracic Nursing, 2009. 20(1): p. 17-32. Thai.

21. Enomoto LM, et al., Risk factors associated with 30-day readmission and length of stay in patients with type 2 diabetes. Journal of Diabetes and its Complications, 2017. 31(1): p. 122-127.

22. Ching HY, YAP FKP. Paediatric hospital readmissions with diabetes mellitus. 2012.

Downloads

Published

2018-09-12

Issue

Section

บทความวิจัย