Abstract
Detecting anomalous sequences is an integral part of building and protecting modern large-scale health information technology (HIT) systems. These HIT systems generate a large volume of records of patients’ state and significant events, which provide a valuable resource to help improve clinical decisions, patient care processes, and other issues. However, detecting anomalous sequences in electronic health records (EHR) remains a challenge in healthcare applications for several reasons, including imbalances in the data, complexity of relationships between events in the sequence, and the curse of dimensionality. Conventional anomaly detection methods use the finite sequence of events to discriminate sequences. They fail to incorporate salient event details under variable higher-order dependencies (e.g., duration between events) that can provide better discrimination of sequences in their models. To address this problem, we propose event sequence and subsequence anomaly detection algorithms that (1) use network-based representations of interactions in the data, (2) account for variable higher-order dependencies in the data, and (3) incorporate events duration for adequate discrimination of the data. The proposed approach identifies anomalies by monitoring the change in the graph after the test sequence is removed from the network. The change is quantified using graph distance metrics so that dramatic changes in the network can be attributed to the removed sequence. Furthermore, the proposed subsequence algorithm recommends plausible paths and salient information for the detected anomalous subsequences. Our results show that the proposed event sequence anomaly detection algorithm outperforms the baseline methods for both synthetic data and real-world EHR data.