Skip to main content

Dataset Repository for Investigating Suicide Risk Using Social and Environmental Determinants of Health

Publication Type
ORNL Report
Publication Date

Suicide is frequently modeled as a function of genetics and environment, where the latter refers to factors other than direct biological consequences, such as air quality, financial level, social connectivity, transportation and food access, and homelessness status. According to the World Health Organization, clean air, a stable climate, adequate water, sanitation and hygiene, safe chemical use, radiation protection, healthy and safe workplaces, sound agricultural practices, health-supportive cities and built environments, and a preserved natural environment are all prerequisites for good health. Understanding the relationships between these determinants and mental health outcomes requires standardized data that can be included in healthcare programs and health outcome models. There is a wealth of publicly available data on social and environmental factors provided by various US organizations that can benefit the design of health care systems and public health interventions, as well as improve our comprehension of factors that impact health. Such information would not only help improve the understanding of individual and community risk but also identify new risk factors that have not previously been therapeutically targeted, especially in terms of their impact on mental health. However, curating and standardizing such datasets is challenging because they are often recorded at numerous geographical and temporal resolutions and with varying spatial and temporal granularities. To address this challenge, we launched an endeavor in conjunction with the Veterans Health Administration to collect publicly available socioeconomic and environmental determinants of health statistics in the US. In this manuscript, we describe a social and environmental determinants of health (SEDH) datasets repository, data curation documentation, and a pipeline framework for data generation; This effort started in 2020, when we began constructing a scalable pipeline to automate the download, extraction, preparation, analysis, and production of datasets. These datasets have been made available to the VHA and may be shared upon agreement with collaborating organizations.