Waiting for answers surrounding a healthcare condition can be as stressful as the condition itself. Maria Mahbub, a research collaborator at Oak Ridge National Laboratory, is developing technology that could help providers and patients get answers sooner.
Healthcare providers often look to trusted documentation known as clinical practice guidelines, or CPG, as a roadmap for patient care decisions. These guidelines contain evidence-based, timely recommendations and best practices from therapy-area-specific medical organizations.
However, they can be dense and time-consuming to use effectively. Mahbub’s group aims to go beyond speeding up the CPG reading process — she’s teaching machines to dive into the guidelines and surface query-specific answers. Their work was published in the Institute of Electrical and Electronics Engineers Xplore journal.
“We wanted to provide clinicians with a system that can extract information for them,” Mahbub said. “One of the main issues we tackled is data unavailability.”
Biomedical machine reading comprehension, or bio-MRC, is a critical aspect of natural language processing in clinical contexts. The four current divisions of bio-MRC are scholarly articles, clinical notes, medical examination and consumer health questions. Mahbub’s group’s application of bio-MRC to CPG is a new area.
In most machine learning contexts, including bio-MRC, machines need to train on large, labeled datasets. Since these methods have never been applied to CPG before, there were no existing datasets for training machines on CPG. To take bio-MRC in a new direction, Mahbub and her team had to draw their own map — or, better said, write their own language.
Her team collaborated with the Department of Veterans Affairs to generate a small dataset of 1,000 questions with answers and contexts. The group had subject matter experts manually read CPG documents paragraph-by-paragraph and applied a technique called transfer learning, using a pre-trained model applied to a new problem. Then, as she does with most of her research on robust models, Mahbub applied an error analysis on components of the CPG, types of questions and length of answers.
Ultimately, the group’s model was able to answer 78% of the questions from each set with an exact match answer. Mahbub said this was satisfactory for the small dataset they used.
“Basically, this tells us that 78% of the time, the machine has been able to extract correct information from the clinical practice guideline,” Mahbub said. “That’s a pretty good number for the small amount of data.”
However, the team noted in the paper that there are improvements to be made in this model, and for future applications. Currently, it only pulls from search terms and phrases. Mahbub said the team wants to enhance the model to incorporate tabulated data so it can better pull information in CPGs located in tables and charts.
Another hurdle to overcome is one most healthcare technology runs into eventually: provider trust and, beyond that, solution adoption. Healthcare is notoriously slow compared to other industries when it comes to technology adoption, and part of that is because of care team trust in the solutions.
Mahbub noted that the team has thought of this and is working with a specific application of MRC to clinical notes to draw out information about injection drug use. The team will slowly roll out the technology, starting with a small group of emergency room doctors, to measure trust and acceptance of the approach.
“I think going slow and one step at a time is the best way to go here,” Mahbub said.
UT-Battelle manages ORNL for the Department of Energy’s Office of Science, the single largest supporter of basic research in the physical sciences in the United States. The Office of Science is working to address some of the most pressing challenges of our time. For more information, please visit https://energy.gov/science.