DS Journal of Digital Science and Technology (DS-DST)

Research Article | Open Access | Download Full Text

Volume 4 | Issue 3 | Year 2025 | Article Id: DST-V4I3P101 DOI: https://doi.org/10.59232/DST-V4I3P101

A Machine Learning-Based Solution for Diabetes Diagnosis: Integrating Clinical Data and Vital Signs

Huy Huynh, Thanh Cao, Hai Tran

ReceivedRevisedAcceptedPublished
18 Jul 202510 Aug 202518 Sep 202528 Sep 2025

Citation

Huy Huynh, Thanh Cao, Hai Tran. “A Machine Learning-Based Solution for Diabetes Diagnosis: Integrating Clinical Data and Vital Signs.” DS Journal of Digital Science and Technology, vol. 4, no. 3, pp. 1-15, 2025.

Abstract

The diagnosis of diabetes mellitus utilizing clinical data and vital signs is essential for the early identification and efficient management of the condition, particularly as its global prevalence rises. This study examines general diagnostic methodologies utilizing diverse indicators, with a particular emphasis on diabetes diagnostics, subsequently identifying pertinent features to develop an effective, cohesive diagnostic solution. The suggested solution uses both structured and unstructured features. Structured features include vital signs, demographics, and lab tests, while unstructured features include chief complaints and medical notes. This solution is based on the MIMIC-IV dataset, which is a rich and varied source of medical data. The research suggests creating a prototype diagnostic system that uses modern machine learning models like Logistic Regression, Support Vector Machine (SVM), Gradient Boosting, Random Forest, and XGBoost. This system is built to automatically process and combine information from different types of data, such as medical text and biological indicators, to improve the prediction process and help doctors make decisions. This solution not only offers a thorough and efficient method for diagnosing diabetes, but it also shows how it could be used in real-world healthcare systems. The prototype system can be used to help doctors make quick and correct clinical decisions. It can also be used as a starting point for future research and applications in smart healthcare.

Keywords

Diabetes Diagnosis, Machine Learning, Clinical Information, Vital Signs, MIMIC-IV Dataset.

References

[1] Alberto Garcés-Jiménez et al., “Predictive Health Monitoring: Leveraging Artificial Intelligence for Early Detection of Infectious Diseases in Nursing Home Residents through Discontinuous Vital Signs Analysis,” Computers in Biology and Medicine, vol. 174, pp. 1-13, 2024. 

[CrossRef] [Google Scholar] [Publisher Link]

[2] Leila Ismail, Huned Materwala, and Juma Al Kaabi, “Association of Risk Factors with Type 2 Diabetes: A Systematic Review,” Computational and Structural Biotechnology Journal, vol. 19, pp. 1759-1785, 2021. 

[CrossRef] [Google Scholar] [Publisher Link]

[3] Mohamed Khalifa, and Mona Albadawy, “Artificial Intelligence for Diabetes: Enhancing Prevention, Diagnosis, and Effective Management,” Computer Methods and Programs in Biomedicine Update, vol. 5, pp. 1-14, 2024. 

[CrossRef] [Google Scholar] [Publisher Link]

[4] Luis Fregoso-Aparicio et al., “Machine Learning and Deep Learning Predictive Models for Type 2 Diabetes: A Systematic Review,” Diabetology & Metabolic Syndrome, vol. 13, no. 1, pp. 1-22, 2021. 

[CrossRef] [Google Scholar] [Publisher Link]

[5] Toshita Sharma, and Manan Shah, “A Comprehensive Review of Machine Learning Techniques on Diabetes Detection,” Visual Computing for Industry, Biomedicine, and Art, vol. 4, no. 1, pp. 1-16, 2021. 

[CrossRef] [Google Scholar] [Publisher Link]

[6] Kuo Ren Tan et al., “Evaluation of Machine Learning Methods Developed for Prediction of Diabetes Complications: A Systematic Review,” Journal of Diabetes Science and Technology, vol. 17, no. 2, pp. 474-489, 2023. 

[CrossRef] [Google Scholar] [Publisher Link]

[7] Negar Orangi-Fard, “Prediction of COPD Using Machine Learning, Clinical Summary Notes, and Vital Signs,” arXiv Preprint, pp. 1-17, 2024.

[CrossRef] [Google Scholar] [Publisher Link]

[8] Raúl López-Izquierdo et al., “Clinical Phenotypes and Short-Term Outcomes based on Prehospital Point-of-Care Testing and on-Scene Vital Signs,” npj Digital Medicine, vol. 7, no, 1, pp. 1-8, 2024. 

[CrossRef] [Google Scholar] [Publisher Link]

[9] Beatriz Soares et al., “Impact and Classification of Body Stature and Physiological Variability in the Acquisition of Vital Signs Using Continuous Wave Radar,” Applied Sciences, vol. 14, no. 2, pp. 1-20, 2024. 

[CrossRef] [Google Scholar] [Publisher Link]

[10] Michael Fascia, “Machine Learning Applications in Medical Prognostics: A Comprehensive Review,” arXiv preprint, pp. 1-30, 2024. 

[CrossRef] [Google Scholar] [Publisher Link]

[11] Rosa Verhoeven et al., “Using Vital Signs for the Early Prediction of Necrotizing Enterocolitis in Preterm Neonates with Machine Learning,” Children, vol. 11, no. 12, pp. 1-11, 2024. 

[CrossRef] [Google Scholar] [Publisher Link]

[12] Gayoung Kim, “Deep Learning Model for Predicting Critical Patient Conditions,” International Journal of Fuzzy Logic and Intelligent Systems, vol. 24, no. 3, pp. 287-294, 2024. 

[CrossRef] [Google Scholar] [Publisher Link]

[13] Murad A. Rassam, and Amal A. Al-Shargabi, “Monitoring Critical Health Conditions in the Elderly: A Deep Learning-Based Abnormal Vital Sign Detection Model,” Technologies, vol. 12, no. 12, pp. -23, 2024. 

[CrossRef] [Google Scholar] [Publisher Link]

[14] Haeil Park, and Chan Seok Park, “A Machine Learning Approach for Predicting in-Hospital Cardiac Arrest using Single-Day Vital Signs, Laboratory Test Results, and International Classification of Disease-10 Block for Diagnosis,” Annals of Laboratory Medicine, vol. 45, no. 2, pp. 209-217, 2024. 

[CrossRef] [Google Scholar] [Publisher Link]

[15] Sohaib R. Awad, and Faris S. Alghareb, “Encoding-based Machine Learning Approach for Health Status Classification and Remote Monitoring of Cardiac Patients,” Algorithms, vol. 18, no. 2, pp. 1-24, 2025. 

[CrossRef] [Google Scholar] [Publisher Link]

[16] Alberto Gudiño-Ochoa et al., “Enhanced Diabetes Detection and Blood Glucose Prediction using Tinyml-Integrated E-Nose and Breath Analysis: A Novel Approach Combining Synthetic and Real-World Data,” Bioengineering, vol. 11, no. 11, pp. 1-26, 2024. 

[CrossRef] [Google Scholar] [Publisher Link]

[17] Tung-Lai Hu et al., “Machine Learning-Based Predictions of Mortality and Readmission in Type 2 Diabetes Patients in the ICU,” Applied Sciences, vol. 14, no. 18, pp. 1-16, 2024. 

[CrossRef] [Google Scholar] [Publisher Link]

[18]  Priya Shirley Muller et al., “Improving Diabetes Diagnosis in Instantaneous Situations with MANET and Data Mining,” Journal of Environmental Protection and Ecology, vol. 25, no. 4, pp. 1330-1343, 2024. 

[Google Scholar]

[19] Badriah Alkalifah et al., “Evaluation of Machine Learning-Based Regression Techniques for Prediction of Diabetes Levels Fluctuations,” Heliyon, vol. 11, no. 1, 1-12, 2025. 

[Google Scholar] [Publisher Link]

[20] Furqan Rustam et al., “Enhanced Detection of Diabetes Mellitus using Novel Ensemble Feature Engineering Approach and Machine Learning Model,” Scientific Reports, vol. 14, no. 1, pp. 1-16, 2024. 

[CrossRef] [Google Scholar] [Publisher Link]

[21] Jorge A. Morgan-Benita et al., “Setting Ranges in Potential Biomarkers for Type 2 Diabetes Mellitus Patients Early Detection By Sex-An Approach with Machine Learning Algorithms,” Diagnostics, vol. 14, no. 15, pp. 1-43, 2024. 

[CrossRef] [Google Scholar] [Publisher Link]

[22] Paidipati Dinesh, A.S. Vickram, and P. Kalyanasundaram, “Medical Image Prediction for Diagnosis of Breast Cancer Disease Comparing the Machine Learning Algorithms: SVM, KNN, Logistic Regression, Random Forest and Decision Tree to Measure Accuracy,” AIP Conference Proceedings, vol. 2853, no. 1, 2024. 

[CrossRef] [Google Scholar] [Publisher Link]

[23] Surajit Das et al., “Machine Learning in Healthcare Analytics: A State-of-the-Art Review,” Archives of Computational Methods in Engineering, vol. 31, no. 7, pp. 3923-3962, 2024. 

[CrossRef] [Google Scholar] [Publisher Link]

[24] Shikha Prasher, and Leema Nelson, “Early Prediction of Obesity Risk in Older Adults using XGBoost Classifier,” 2024 7th International Conference on Circuit Power and Computing Technologies (ICCPCT), Kollam, India, pp. 1599-1603, 2024. 

[CrossRef] [Google Scholar] [Publisher Link]

[25] Prince Jain et al., “Enhanced Cardiovascular Diagnostics using Wearable ECG and Bioimpedance Monitoring with LightGBM Classifier,” Biosensors and Bioelectronics: X, vol. 24, pp. 1-7, 2025. 

[CrossRef] [Google Scholar] [Publisher Link]

[26] Abdulaziz Aldaej, Tariq Ahamed Ahanger, and Imdad Ullah, “Deep Neural Network-Based Secure Healthcare Framework,” Neural Computing and Applications, vol. 36, no. 28, pp. 17467-17482, 2024. 

[CrossRef] [Google Scholar] [Publisher Link]

[27] Naif Al Mudawi et al., “Innovative Healthcare Solutions: Robust Hand Gesture Recognition of Daily Life Routines using 1D CNN,” Frontiers in Bioengineering and Biotechnology, vol. 12, pp. 1-17, 2024. 

[CrossRef] [Google Scholar] [Publisher Link]

[28] Alistair Johnson et al., “MIMIC-IV,” PhysioNet, RRID:SCR_007345, 2021. 

[CrossRef] [Google Scholar] [Publisher Link]

[29] Alistair Johnson et al., “MIMIC-IV-ED (version 2.2),” PhysioNet, RRID:SCR_007345, 2023. 

[CrossRef] [Google Scholar] [Publisher Link]

[30] Nilesh Kumar Sahu et al., “Leveraging Language Models for Summarizing Mental State Examinations: A Comprehensive Evaluation and Dataset Release,” Proceedings of the 31st International Conference on Computational Linguistics, Abu Dhabi, UAE, pp. 2658-2682, 2025. 

[Google Scholar] [Publisher Link]

[31] Luke Oluwaseye Joel, Wesley Doorsamy, and Babu Sena Paul, “On the Performance of Imputation Techniques for Missing Values on Healthcare Datasets,” arXiv Preprint, pp. 1-20, 2024. 

[CrossRef] [Google Scholar] [Publisher Link]

[32] Zulfikar Setyo Priyambudi, and Yusuf Sulistyo Nugroho, “Which Algorithm is better? An Implementation of Normalization to Predict Student Performance,” AIP Conference Proceedings, vol. 2926, no. 1, 2024. 

[CrossRef] [Google Scholar] [Publisher Link]

[33] JiaHang Li et al., “Comparison of the Effects of Imputation Methods for Missing Data in Predictive Modelling of Cohort Study Datasets,” BMC Medical Research Methodology, vol. 24, no. 1, pp. 1-9, 2024. 

[CrossRef] [Google Scholar] [Publisher Link]

[34] Yi Sun et al., “Modifying the One-Hot Encoding Technique Can Enhance the Adversarial Robustness of the Visual Model for Symbol Recognition,” Expert Systems with Applications, vol. 250, 2024. 

[CrossRef] [Google Scholar] [Publisher Link]

[35] Zakia Labd et al., “Text Classification Supervised Algorithms with Term Frequency Inverse Document Frequency and Global Vectors for Word Representation: A Comparative Study,” International Journal of Electrical & Computer Engineering, vol. 14, no. 1, pp. 589-599, 2024. 

[CrossRef] [Google Scholar] [Publisher Link]

[36] Jiang Wu et al., “Data Pipeline Training: Integrating Automl to Optimize the Data Flow of Machine Learning Models,” arXiv preprint, pp. 1-5, 2024. 

[CrossRef] [Google Scholar] [Publisher Link]

[37] S. Sathyanarayanan, and B. Roopashri Tantri, “Confusion Matrix-Based Performance Evaluation Metrics,” African Journal of Biomedical Research, vol. 27, no. 4S, pp. 4023-4031, 2024.  

[CrossRef] [Google Scholar] [Publisher Link]