Training and Generalization of Residual Neural Networks as Discrete Analogues of Neural ODEs

Abstract

Residual neural networks (ResNets) feature extra connections between layers enabling
incremental learning between layers. Their infinite-layer limit, Neural ODEs (NODEs)
can provide insights into the role of depth in network architecture. In addition,
the mature field of differential equations can provide mathematical insights into
the NODE context and, as an extension, help better understand ResNets and
neural networks in general, opening doors for better training algorithms
and improved generalization performance.

Inspired by the continuous, NODE analogy, we will examine ResNet weight matrix parameterization
as functions of depth. The choice for parameterization affects the capacity of the network,
leading to regularization and a subsequent reduction of the generalization gap.
Further, drawn from the NODE analogy again, we study the role of stiffness as
a means of regularization for ResNets. We define a discrete version of
stiffness for ResNets, and implement penalization by it as a measure to
regularize the training. We will demonstrate the methods on applications of
DOE/SNL interest, ranging from materials science to climate models.


Shorter version:

Residual neural networks (ResNets) feature extra connections between layers enabling
incremental learning between layers. Inspired by the continuous, Neural ODE analogy,
we will examine ResNet weight matrix parameterization as functions of depth.
The choice for parameterization affects the capacity of the network,
leading to regularization and a subsequent reduction of the generalization gap.
We will demonstrate the methods on applications of DOE/SNL interest,
ranging from materials science to climate models.

Date
Jul 26, 2022
Event
MLDL Workshop, Sandia
Location
(virtual) Albuquerque, NM