Seminars: S. V. Kozyrev, Physical Principles in Machine Learning: How to Explain Grokking

SEMINARS


Steklov Mathematical Institute Seminar October 16, 2025 16:00, Moscow, Steklov Mathematical Institute of RAS, Conference Hall (8 Gubkina)

Physical Principles in Machine Learning: How to Explain Grokking S. V. Kozyrev Steklov Mathematical Institute of Russian Academy of Sciences, Moscow
https://vk.com/video-222947497_456239135 https://youtu.be/cYzGHajFbTM Abstract: Physics-like models in learning theory will be discussed. Grokking (delayed generalization) is a phenomenon in the learning theory for overparameterized systems (i.e., systems with a large number of parameters) for algorithmic learning problems (e.g., learning multiplication). During grokkng, the system quickly memorizes the training set (e.g., half of the multiplication table), but initially gives incorrect answers to the test set (the other half of the multiplication table). Then, as the stochastic gradient descent procedure continues, grokking (delayed generalization) occurs — the system begins to give correct answers to questions from the test set. In this talk, stochastic gradient descent will be considered as Brownian motion, and grokking will be explained as a manifestation of the second law of thermodynamics and Eyring's formula in kinetic theory. The presentation will follow the preprint S. V. Kozyrev, How to explain grokking, arXiv:2412.18624.