|
|
| SEMINARS |
|
Steklov Mathematical Institute Seminar
|
|||
|
|
|||
|
Physical Principles in Machine Learning: How to Explain Grokking S. V. Kozyrev Steklov Mathematical Institute of Russian Academy of Sciences, Moscow |
|||
|
Abstract: Physics-like models in learning theory will be discussed. Grokking (delayed generalization) is a phenomenon in the learning theory for overparameterized systems (i.e., systems with a large number of parameters) for algorithmic learning problems (e.g., learning multiplication). During grokkng, the system quickly memorizes the training set (e.g., half of the multiplication table), but initially gives incorrect answers to the test set (the other half of the multiplication table). Then, as the stochastic gradient descent procedure continues, grokking (delayed generalization) occurs — the system begins to give correct answers to questions from the test set. In this talk, stochastic gradient descent will be considered as Brownian motion, and grokking will be explained as a manifestation of the second law of thermodynamics and Eyring's formula in kinetic theory. The presentation will follow the preprint S. V. Kozyrev, How to explain grokking, arXiv:2412.18624. |
|||