A. E. Sadchikov, S. A. Chezhegov, A. N. Beznosikov, A. V. Gasnikov, “Local SGD for near-quadratic problems: Improving convergence under unconstrained noise conditions”, УМН, 2024, том 79, выпуск 6(480),страницы 83

Local SGD for near-quadratic problems: Improving convergence under unconstrained noise conditions

[Local SGD for near-quadratic problems: Improving convergence under unconstrained noise conditions]

A. E. Sadchikov^a, S. A. Chezhegov^ba, A. N. Beznosikov^bcd, A. V. Gasnikov^dba

^a Moscow Institute of Physics and Technology (National Research University), Moscow, Russia
^b Ivannikov Institute for System Programming of the Russian Academy of Sciences, Moscow, Russia
^c Sber AI Lab, Moscow, Russia
^d Innopolis University, Innopolis, Russia

Аннотация: Distributed optimization plays an important role in modern large-scale machine learning and data processing systems by optimizing the utilization of computational resources. One of the classical and popular approaches is Local Stochastic Gradient Descent (Local SGD), characterized by multiple local updates before averaging, which is particularly useful in distributed environments to reduce communication bottlenecks and improve scalability. A typical feature of this method is the dependence on the frequency of communications. But in the case of a quadratic target function with homogeneous data distribution over all devices, the influence of the frequency of communications vanishes. As a natural consequence, subsequent studies include the assumption of a Lipschitz Hessian, as this indicates the similarity of the optimized function to a quadratic one to a certain extent. However, in order to extend the completeness of Local SGD theory and unlock its potential, in this paper we abandon the Lipschitz Hessian assumption by introducing a new concept of approximate quadraticity. This assumption gives a new perspective on problems that have near quadratic properties. In addition, existing theoretical analyses of Local SGD often assume a bounded variance. We, in turn, consider the unbounded noise condition, which allows us to broaden the class of problems under study.
Bibliography: 36 titles.

Ключевые слова: distributed optimization, quadraticity, strong growth condition.

УДК: 519.853.62

MSC: 90C15

Поступила в редакцию: 16.08.2024

Язык публикации: английский

DOI: 10.4213/rm10207