Active Learning and Uncertainty Quantification for Machine Learning Interatomic Potentials

Abstract

Molecular dynamics (MD) simulations are typically based on interatomic potentials (IAPs) that are constructed from empirical and physical considerations instead of expensive, ab initio quantum mechanical (QM) computations. In recent years, drastic improvements in algorithms and computational power enabled very accurate data-driven approximations using machine learning (ML) methods. Machine-learned IAPs (MLIAPs) sacrifice some QM accuracy but allow handling systems with up to a million atoms.

The MLIAPs can range from simple linear basis expansions to highly parameterized forms such as neural networks (NNs), encapsulating the functional relationship between atomic configuration and QM-driven potential energy. Since the associated QM calculations are computationally expensive, it is essential to achieve the highest possible accuracy in MLIAPs with as few QM calculations as possible.

In this work, we develop and deploy active learning methods in order to guide training data selection for MLIAP construction. Active learning largely relies on estimates of uncertainties in MLIAP evaluation. Besides facilitating active learning, uncertainty quantification (UQ) for MLIAPs is also useful for the selection of MLIAP models of optimal complexity, thereby reducing the risk of overfitting. Furthermore, MLIAPs equipped with UQ enable the propagation of uncertainty through MD simulations, thereby providing uncertainty estimates on MD simulation outputs.

Conventional active learning methods rely on empirical uncertainty estimates based on ensemble techniques such as query-by-committee. Besides those, we explore Bayesian inference to obtain posterior probability density functions (PDFs) on MLIAP parameters, thereby quantifying their joint uncertainty. In linear-expansion MLIAP cases, one can achieve closed-form solutions for these PDFs, while for non-linear parametric forms or non-Gaussian likelihoods, sampling methods such as Markov chain Monte Carlo (MCMC) are necessary. There are, however, unsurmountable computational challenges for MCMC when dealing with highly overparameterized MLIAPs, such as those in NN form. In such cases, approximate parameterized posterior PDFs can be found via variational inference, admitting a certain level of uncertainty underestimation for extrapolatory predictions.

In this talk, we will discuss our work on a range of UQ approaches for MLIAPs, from both active learning (optimal data selection) and model selection viewpoints. We will demonstrate the results on chemical systems of interest driven by material science applications.

(optional)
Sandia National Laboratories is a multimission laboratory managed and operated by National Technology & Engineering Solutions of Sandia, LLC, a wholly owned subsidiary of Honeywell International Inc., for the U.S. Department of Energy’s National Nuclear Security Administration under contract DE-NA0003525.

Date
Sep 28, 2021
Event
MMLDT/CSET Mechanistic Machine Learning and Digital Twins for Computational Science, Engineering & Technology
Location
(virtual) San Diego, CA