A. E. Sadchikov, S. A. Chezhegov, A. N. Beznosikov, A. V. Gasnikov, “Local SGD for near-quadratic problems: Improving convergence under unconstrained noise conditions”, Uspekhi Mat. Nauk, 2024, Volume 79, Issue 6(480),Pages <nobr>83

Local SGD for near-quadratic problems: Improving convergence under unconstrained noise conditions

A. E. Sadchikov^a, S. A. Chezhegov^ab, A. N. Beznosikov^bcd, A. V. Gasnikov^abd

^a Moscow Institute of Physics and Technology (National Research University), Moscow, Russia
^b Ivannikov Institute for System Programming of the Russian Academy of Sciences, Moscow, Russia
^c Sber AI Lab, Moscow, Russia
^d Innopolis University, Innopolis, Russia

Abstract: Distributed optimization plays an important role in modern large-scale machine learning and data processing systems by optimizing the utilization of computational resources. One of the classical and popular approaches is Local Stochastic Gradient Descent (Local SGD), characterized by multiple local updates before averaging, which is particularly useful in distributed environments to reduce communication bottlenecks and improve scalability. A typical feature of this method is the dependence on the frequency of communications. But in the case of a quadratic target function with homogeneous data distribution over all devices, the influence of the frequency of communications vanishes. As a natural consequence, subsequent studies include the assumption of a Lipschitz Hessian, as this indicates the similarity of the optimized function to a quadratic one to a certain extent. However, in order to extend the completeness of Local SGD theory and unlock its potential, in this paper we abandon the Lipschitz Hessian assumption by introducing a new concept of approximate quadraticity. This assumption gives a new perspective on problems that have near quadratic properties. In addition, existing theoretical analyses of Local SGD often assume a bounded variance. We, in turn, consider the unbounded noise condition, which allows us to broaden the class of problems under study.
Bibliography: 36 titles.

Keywords: distributed optimization, quadraticity, strong growth condition.

UDC: 519.853.62

Received: 16.08.2024

Language: English

DOI: 10.4213/rm10207