M. S. Nakhodnov, M. S. Kodryan, E. M. Lobacheva, D. S. Vetrov, “Loss function dynamics and landscape for deep neural networks trained with quadratic loss”, Dokl. RAN. Math. Inf. Proc. Upr., 2022, Volume 508,Pages <nobr>50

This article is cited in 1 paper

ADVANCED STUDIES IN ARTIFICIAL INTELLIGENCE AND MACHINE LEARNING

Loss function dynamics and landscape for deep neural networks trained with quadratic loss

M. S. Nakhodnov^a, M. S. Kodryan^b, E. M. Lobacheva^b, D. S. Vetrov^ab

^a Artificial Intelligence Research Institute, Moscow, Russia
^b HSE University, Moscow, Russia

Abstract: Knowledge of the loss landscape geometry makes it possible to successfully explain the behavior of neural networks, the dynamics of their training, and the relationship between resulting solutions and hyperparameters, such as the regularization method, neural network architecture, or learning rate schedule. In this paper, the dynamics of learning and the surface of the standard cross-entropy loss function and the currently popular mean squared error (MSE) loss function for scale-invariant networks with normalization are studied. Symmetries are eliminated via the transition to optimization on a sphere. As a result, three learning phases with fundamentally different properties are revealed depending on the learning step on the sphere, namely, convergence phase, phase of chaotic equilibrium, and phase of destabilized learning. These phases are observed for both loss functions, but larger networks and longer learning for the transition to the convergence phase are required in the case of MSE loss.

Keywords: scale invariance, batch normalization, training of neural networks, optimization, MSE loss function.

UDC: 004.8

Presented: A. A. Shananin
Received: 28.10.2022
Revised: 28.10.2022
Accepted: 01.11.2022

DOI: 10.31857/S2686954322070189