Skip to main content

High-Performance Deep Learning Toolbox for Genome-Scale Prediction of Protein Structure and Function...

Publication Type
Conference Paper
Book Title
2021 IEEE/ACM Workshop on Machine Learning in High Performance Computing Environments (MLHPC)
Publication Date
Page Numbers
46 to 57
Publisher Location
United States of America
Conference Name
7th workshop on Machine Learning in High Performance Computing Environments (MLHPC)
Conference Location
St. Loius, Missouri, United States of America
Conference Sponsor
Conference Date

Computational biology is one of many scientific disciplines ripe for innovation and acceleration with the advent of high-performance computing (HPC). In recent years, the field of machine learning has also seen significant benefits from adopting HPC practices. In this work, we present a novel HPC pipeline that incorporates various machine-learning approaches for structure-based functional annotation of proteins on the scale of whole genomes. Our pipeline makes extensive use of deep learning and provides computational insights into best practices for training advanced deep-learning models for high-throughput data such as proteomics data. We showcase methodologies our pipeline currently supports and detail future tasks for our pipeline to envelop, including large-scale sequence comparison using SAdLSA and prediction of protein tertiary structures using AlphaFold2.