RUS  ENG
Full version
JOURNALS // Theoretical and Applied Mechanics // Archive

Theor. Appl. Mech., 2025 Volume 52, Issue 1, Pages 67–73 (Mi tam152)

On non-approximability of zero loss global $\mathcal{L}^2$ minimizers by gradient descent in deep learning

Thomas Chen, Patricia  Muñoz Ewald

Department of Mathematics, University of Texas at Austin, Austin TX, USA

Abstract: We analyze geometric aspects of the gradient descent algorithm in Deep Learning (DL), and give a detailed discussion of the circumstance that, in underparametrized DL networks, zero loss minimization cannot generically be attained. As a consequence, we conclude that the distribution of training inputs must necessarily be non-generic in order to produce zero loss minimizers, both for the method constructed in [2, 3], or for gradient descent [1] (which assume clustering of training data).

Keywords: deep learning, underparametrization, generic training data, zero loss.

MSC: 57R70, 62M45

Received: 21.01.2025
Accepted: 05.05.2025

Language: English

DOI: 10.2298/TAM250121008C



© Steklov Math. Inst. of RAS, 2025