Machine Learning to model the full space of chemical biology and drug discovery with quantum-mechanical accuracy
Investigator: Thomas Miller
Affiliation: Entos, Inc
This project is using their new machine learning (ML) model to screen drug molecules for an affinity with protein targets associated with COVID-19 infection. The goal is to span the full space of chemical biology and drug discovery with quantum-mechanical accuracy at much reduced computational cost.
The team is using Cori to perform a vast set of quantum chemistry density functional theory calculations to generate a large training dataset that includes energies, gradients, and other ML features.
The urgency of the COVID-19 pandemic demands that successful treatments be produced on the timescale of months, not years or decades as is more typical of drug- and vaccine-development efforts. A key bottleneck for identification of promising COVID-19 drug candidates is the accurate and efficient screening of drug molecules for affinity with protein targets associated with COVID-19 infection and physiological response. Entos, Inc is disrupting this paradigm through the development of quantum mechanical ML features that encode and describe molecular properties with unprecedented accuracy and efficiency. They have developed machine-learning (ML) methods that have proven to be 1000-fold acceleration of quantum simulations with negligible loss in accuracy.
The team is working to extend this ML model to span the full space of chemical biology and drug discovery with quantum-mechanical accuracy and force-field-level cost. The scope of this effort is truly ambitious, and it will involve training a flagship graph-neural-net model of unprecedented scale (>8B parameters) using a massive GPU resource like NVIDIA's Selene. However, the project additionally requires training data from a vast set of quantum chemistry calculations (100M-1B density functional theory calculations using the Entos Qcore electronic structure package, to generate energies, gradients, and other ML features), which would optimally be performed on a KNL resource like NERSC's Cori. The data from these calculations will be made public and should lead to publishable results of fundamental research.
About NERSC and Berkeley Lab
The National Energy Research Scientific Computing Center (NERSC) is a U.S. Department of Energy Office of Science User Facility that serves as the primary high performance computing center for scientific research sponsored by the Office of Science. Located at Lawrence Berkeley National Laboratory, NERSC serves almost 10,000 scientists at national laboratories and universities researching a wide range of problems in climate, fusion energy, materials science, physics, chemistry, computational biology, and other disciplines. Berkeley Lab is a DOE national laboratory located in Berkeley, California. It conducts unclassified scientific research and is managed by the University of California for the U.S. Department of Energy. »Learn more about computing sciences at Berkeley Lab.