Skip to main content
SHARE
Publication

Large-scale deep learning for metastasis detection in pathology reports

Publication Type
Journal
Journal Name
JAMIA Open
Publication Date
Volume
8
Issue
4

Objectives: No existing algorithm can reliably identify metastasis from pathology reports across multiple cancer types and the entire US population. In this study, we develop a deep learning model that automatically detects patients with metastatic cancer by using pathology reports from many laboratories and of multiple cancer types.

Materials and Methods: We use 60 471 unstructured pathology reports from 4 Surveillance, Epidemiology, and End Results (SEER) registries. The reports were coded into 1 of 3 labels: metastasis negative, metastases positive, or metastasis undetermined. We utilize a task-specific deep neural network trained from scratch and compare its performance with a widely used large language model (LLM).

Results: Our deep learning architecture trained on task-specific data outperforms a general-purpose LLM, with a recall of 0.894 compared to 0.824. We quantified model uncertainty and used it to defer reports for human review. We found that retaining 72.9% of reports increased recall from 0.894 to 0.969.

Discussion: A smaller deep learning architecture trained on task-specific data outperforms a general LLM. Equally critical to model performance is the incorporation of uncertainty quantification, achieved here through an abstention mechanism.

Conclusions: This study’s finding demonstrate the feasibility of developing algorithms to automatically identify metastatic cancer cases from unstructured pathology reports.