Skip to main content
SHARE
Publication

Six Machine-Learning Methods for Predicting Hospital-Stay Duration for Patients with Sepsis: A Comparative Study

by Lingtao Chen, Hilda B Klasky
Publication Type
Conference Paper
Book Title
IEEE SoutheastCon 2022
Publication Date
Page Numbers
302 to 309
Publisher Location
New Jersey, United States of America
Conference Name
IEEE Southeast Con 2022
Conference Location
Virtual, Alabama, United States of America
Conference Sponsor
IEEE
Conference Date
-

Sepsis is a life-threatening medical condition that, if not treated promptly, can result in tissue damage, organ failure, and death. According to the Centers for Disease Control, about 270,000 individuals die of sepsis in the US each year. Further, sepsis expenditures accounted for 13% of total US hospital costs in 2013, totaling more than $24 billion. Our project objectives were to determine if Machine Learning algorithms could reliably predict hospital stay duration for patients with sepsis. The data set we used has been de-identified and is freely available through the BupaR package. The data includes 1050 cases, 15214 events, and 16 types of actions related to sepsis patient care. First, we used process mining to determine how long each patient was in the hospital. Using BupaR’s functions, we created several process model graphs. These process models depict the movement of patients at a hospital and provide duration data for each patent case. Second, we identified outlier data and created two dataset versions: one with and one without outliers. We then applied the following analysis methods: Linear Regression, Random Forest, K-Nearest Neighbors, Neural Networks, XGBoost, and lightGBM. We compared the model validations for the six machine learning models using the same data-splitting method. We found that the XGBoost model had the best prediction accuracy of 73.9 percent for cases with outliers, and 79 percent for cases without outliers. We also found that the lightGBM model had the lowest mean absolute error between prediction and actual duration in days with 3.66 days for the case with outliers, and 2.4 days for the case without outliers. These two models outperformed the other four models. This work will be enhanced in the future by exploring new prediction algorithms and comparing them with the results of this study.