About four years ago, ORNL began considering how its scientific strengths could apply to health care. “We were in a unique position with our leadership computing resources and data science expertise, and we saw an opportunity to use health data to discover data-driven insights for better health care quality, integrity, and policy,” said Sreenivas Rangan Sukumar, a research scientist in the Computational Sciences and Engineering Division of ORNL’s Computing and Computational Sciences Directorate.
To analyze publicly available health-related datasets, ORNL computer science researchers leveraged three diverse high-performance computing architectures—the multicore computing Titan (the second-most powerful computer in the world), in-memory graph-computing Apollo (ORNL’s nickname for the Urika computer built by Yarcdata), and distributed storage cloud-computing machines. The publicly available datasets came from sources such as The Cancer Genome Atlas, clinicaltrials.gov, Semantic MEDLINE, openFDA, DocGraph, National Plan and Provider Enumeration System, and clinical partnerships.
Health data is unstructured, challenging to integrate, and restricted by privacy and confidentiality regulations. Researchers must work under privacy-preserving data-use agreements and under the supervision and approval of an institutional review board. Within these parameters, ORNL researchers are using Titan to simulate outcomes of interventions, Apollo for pattern discovery, and cloud computing to understand what happened.
Although applying ORNL’s computing resources to anonymized health care data was relatively routine, meeting the big data challenge of the complicated American health care system is not. Data-driven health care policy relies on the ability to “slice and dice” complex data. Unfortunately, state-of-the-art information technology solutions have created silos, do not scale to these data sizes, or are not flexible enough to accommodate the diversity of big data. ORNL computing experts found a better approach in scalable graph computing, which allows for detailed analysis and discovery of relationships hiding in large quantities of data. By organizing health care data into relationship graphs (linked structures of interacting entities), researchers were able to mine and understand complex patterns of relationships and behaviors in health care delivery.
One area where graph computing provided immediate feedback was in understanding waste, fraud, and abuse within the federal health care system. ORNL analyzed provider data files to identify probable risk factors, which led to the identification of high-risk vendors and providers. Using health care data in nontraditional ways to draw connections between interacting patients and providers helped establish patterns that can identify risk for fraudulent activities and further refine and categorize normal and abnormal behaviors.
Graph computing offers the ability to discover interesting patterns among the entities that interact within a system. Georgia Tourassi, director of ORNL’s Health Data Sciences Institute, offered two examples from the institute’s work. One case revealed organized crime via identity obfuscation (one health care provider using multiple identities to bill patients), whereas the second case revealed guilt-by-association patterns that could predict the risk of a new provider before that provider enrolls and begins billing.
Using the Urika graph analytics appliance, Sukumar and his peers are developing new graph-theoretic models and are applying graph analytics algorithms to pattern discovery and predictive modeling in search of efficient clinical pathways to improve health outcomes. “We believe that the transformation of health care delivery toward improved quality and health outcomes has a better chance of success if patients, providers, and payers all benefit,” Sukumar said. “We are using our computing resources to identify such opportunities.”
Tourassi said ORNL’s approach is novel in health care. Big data computing capabilities in US Department of Energy facilities such as ORNL “are critical to health care delivery,” Tourassi said. “It’s a paradigm shift in an environment that has always been reactive. We’re offering scientific innovations to help the system become more proactive.”
HDSI’s goal is to find unbiased, data-driven solutions. In research, a shortage of time or resources or a failure to ask the right questions can bias a study design and its outcomes; this is where ORNL’s involvement and scientific approach are critical strengths. “Data-driven scientific innovation leveraging ORNL’s computing resources and data science expertise can offer less biased or even unbiased views,” Tourassi said. “As a federally funded research and development center, ORNL can be a natural, honest broker.
“There is a lot of mathematical and computing innovation behind what we are doing by investing in algorithms that are scalable, so we can see where we need to be to handle health data challenges,” Tourassi said. “We know for certain that health data will be getting bigger and more complex as the practice of medicine expands and progresses. By being involved and leveraging the investment, we can anticipate and prepare for the next ‘bottleneck.’”
HDSI is reaching out to partners who have different types of data and diverse needs for data analysis—such as genomics, electronic health records, and health-sensor data. Projects will help partners collect, move, store, integrate, and analyze their data for insights to determine the next-generation practice of personalized medicine.
For example, ORNL researchers are building the capability for clinical experts to semantically reason with medical records and medical literature and to integrate and associate novel health data types (such as claims, medical records, genotypes, and phenotypes), while at the same time simulating the outcomes of different intervention pathways, Sukumar said.
Through strategic clinical partnerships, ORNL and HDSI hope to make a difference in specific areas such as neuroscience and cancer research by supporting the development of better health care delivery and better health outcomes at a lower cost.
“This is the first step,” Sukumar added, “toward leveraging data and computing capability to contribute to the science of medicine.”
Funding for the HDSI is provided through ORNL’s Laboratory Directed Research and Development program.
UT-Battelle manages ORNL for the Department of Energy’s Office of Science. The Office of Science is the single largest supporter of basic research in the physical sciences in the United States, and is working to address some of the most pressing challenges of our time. For more information, please visit http://science.energy.gov/.