Characterizing Sub-Cohorts via Data Normalization and Representation Learning

Show authors

Publication Type

Conference Paper

Book Title

IEEE 33rd International Symposium on Computer Based Medical Systems

Publication Date

July, 2020

Conference Name

IEEE International Symposium on Computer Based Medical Systems (CBMS)

Conference Location

Rochester, Minnesota, United States of America

Conference Sponsor

IEEE

Conference Date

Jul 28, 2020 - Jul 30, 2020

View DOI Listing

Abstract

The process of identifying a cohort of interest is a very challenging task. It requires manually inspecting many patient records of complex structure that might include medical coding errors and missing data. This paper presents a computational pipeline for refining the process of cohort selection based on medical concepts recorded in the electronic health records (EHRs). The pipeline extracts EHR data for a given cohort and normalizes this data using standard vocabularies. Then a stacked denoising autoencoder is used to embed the normalized patient vectors in a low dimensional space, where the patients are subsequently clustered into sub-cohorts. The goal is to represent the cohort in a standard format and abstract variants of sub-populations. As a use-case, we applied the pipeline to 1.8 million Veterans diagnosed with major depressive disorder (MDD), and identified four meaningful sub-cohorts using the features learned by the autoencoder. Then, each sub-cohort was explored using a set of keywords for interpretation.

Characterizing Sub-Cohorts via Data Normalization and Representation Learning

Abstract

Researchers

Organizations