Skip to main content

EHR-BERT: A BERT-based model for effective anomaly detection in electronic health records

Publication Type
Journal Name
Journal of Biomedical Informatics
Publication Date
Page Number

Objective: Physicians and clinicians rely on data contained in electronic health records (EHRs), as recorded by health information technology (HIT), to make informed decisions about their patients. The reliability of HIT systems in this regard is critical to patient safety. Consequently, better tools are needed to monitor the performance of HIT systems for potential hazards that could compromise the collected EHRs, which in turn could affect patient safety. In this paper, we propose a new framework for detecting anomalies in EHRs using sequence of clinical events. This new framework, EHR-Bidirectional Encoder Representations from Transformers (BERT), is motivated by the gaps in the existing deep-learning related methods, including high false negatives, sub-optimal accuracy, higher computational cost, and the risk of information loss. EHR-BERT is an innovative framework rooted in the BERT architecture, meticulously tailored to navigate the hurdles in the contemporary BERT method; thus, enhancing anomaly detection in EHRs for healthcare applications.

Methods: The EHR-BERT framework was designed using the Sequential Masked Token Prediction (SMTP) method. This approach treats EHRs as natural language sentences and iteratively masks input tokens during both training and prediction stages. This method facilitates the learning of EHR sequence patterns in both directions for each event and identifies anomalies based on deviations from the normal execution models trained on EHR sequences.

Results: Extensive experiments on large EHR datasets across various medical domains demonstrate that EHR-BERT markedly improves upon existing models. It significantly reduces the number of false positives and enhances the detection rate, thus bolstering the reliability of anomaly detection in electronic health records. This improvement is attributed to the model’s ability to minimize information loss and maximize data utilization effectively.

Conclusion: EHR-BERT showcases immense potential in decreasing medical errors related to anomalous clinical events, positioning itself as an indispensable asset for enhancing patient safety and the overall standard of healthcare services. The framework effectively overcomes the drawbacks of earlier models, making it a promising solution for healthcare professionals to ensure the reliability and quality of health data.