Thomas Chen, Patricia Muñoz Ewald, “On non-approximability of zero loss global <nobr>$\mathcal{L}^2$</nobr> minimizers by gradient descent in deep learning”, Theor. Appl. Mech., 2025, Volume 52, Issue 1,Pages <nobr>67

On non-approximability of zero loss global $\mathcal{L}^2$ minimizers by gradient descent in deep learning

Thomas Chen, Patricia Muñoz Ewald

Department of Mathematics, University of Texas at Austin, Austin TX, USA

Abstract: We analyze geometric aspects of the gradient descent algorithm in Deep Learning (DL), and give a detailed discussion of the circumstance that, in underparametrized DL networks, zero loss minimization cannot generically be attained. As a consequence, we conclude that the distribution of training inputs must necessarily be non-generic in order to produce zero loss minimizers, both for the method constructed in [2, 3], or for gradient descent [1] (which assume clustering of training data).

Keywords: deep learning, underparametrization, generic training data, zero loss.

MSC: 57R70, 62M45

Received: 21.01.2025
Accepted: 05.05.2025

Language: English

DOI: 10.2298/TAM250121008C