Skip to main content

Adaptive Anomaly Detection for Dynamic Clinical Event Sequences...

Publication Type
Conference Paper
Book Title
2020 IEEE International Conference on Big Data (IEEE BigData 2020)
Publication Date
Page Numbers
4919 to 4928
Conference Name
IEEE International Conference on Big Data
Conference Location
Atlanta, Georgia, United States of America
Conference Sponsor
Conference Date

Over the past decade, health information technology (IT) has enabled the amount of digital information stored in electronic health records (EHRs) to expand greatly. However, according to some studies, hazards in health IT can lead to changes in clinical decisions, care processes, and care outcomes, as well as other issues. Thus, the effects of health IT hazards on patient safety have been at the forefront of recent patient safety research. Nonetheless, hazard detection in health IT remains a challenge. In this paper, the authors assume that safety-related issues in health IT would exhibit anomalous characteristics in EHR data. Although all hazards will exhibit some anomalous characteristics, not all anomalies can be regarded as hazards. The authors hypothesize that errors in health IT could lead to interruptions in the sequence of clinical actions. To this end, the problem of detecting anomalous sequences in big EHR data is considered. This paper focuses on dynamic event sequences, which are a series of clinical actions in motion. The authors propose an adaptive anomaly detection approach that uses higher-order network representation to detect anomalous sequences. Furthermore, the authors propose a contiguous subsequence anomaly detection approach that identifies abnormal subsequences in the detected anomalous sequences. The proposed approaches are tested by using synthetic and real-world EHR data. The proposed methods outperform existing state of the art anomaly detection techniques. To reduce the computational complexity associated with the operational implementation of the proposed approaches, the Apache Spark environment was leveraged, and a much shorter run time together with improved performance were achieved, especially for data with more than 60,000 sequences.