Skip to main content

FORGE: Pre-Training Open Foundation Models for Science

by Junqi Yin, Sajal Dash, Feiyi Wang, Mallikarjun Shankar
Publication Type
Conference Paper
Book Title
SC '23: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis
Publication Date
Page Numbers
1 to 13
Publisher Location
New York, New York, United States of America
Conference Name
The International Conference for High Performance Computing, Networking, Storage, and Analysis (SC)
Conference Location
Denver, Colorado, United States of America
Conference Sponsor
Conference Date

Large language models (LLMs) are poised to revolutionize the way we conduct scientific research. However, both model complexity and pre-training cost are impeding effective adoption for the wider science community. Identifying suitable scientific use cases, finding the optimal balance between model and data sizes, and scaling up model training are among the most pressing issues that need to be addressed. In this study, we provide practical solutions for building and using LLM-based foundation models targeting scientific research use cases. We present an end-to-end examination of the effectiveness of LLMs in scientific research, including their scaling behavior and computational requirements on Frontier, the first Exascale supercomputer. We have also developed for release to the scientific community a suite of open foundation models called FORGE with up to 26B parameters using 257B tokens from over 200M scientific articles, with performance either on par or superior to other state-of-the-art comparable models. We have demonstrated the use and effectiveness of FORGE on scientific downstream tasks. Our research establishes best practices that can be applied across various fields to take advantage of LLMs for scientific discovery.