J. Chen, A. V. Lobanov, A. V. Rogozin, “Distributed min-max optimization using the smoothing technique”, Computer Research and Modeling, 2023, Volume 15, Issue 2,Pages <nobr>469

MATHEMATICAL MODELING AND NUMERICAL SIMULATION

Distributed min-max optimization using the smoothing technique

J. Chen, A. V. Lobanov, A. V. Rogozin

Moscow Institute of Physics and Technology, 9 Institutskiy per., Dolgoprudny, Moscow region, 141701, Russia

Abstract: Distributed saddle point problems (SPPs) have numerous applications in optimization, matrix games and machine learning. For example, the training of generated adversarial networks is represented as a min-max optimization problem, and training regularized linear models can be reformulated as an SPP as well. This paper studies distributed nonsmooth SPPs with Lipschitz-continuous objective functions. The objective function is represented as a sum of several components that are distributed between groups of computational nodes. The nodes, or agents, exchange information through some communication network that may be centralized or decentralized. A centralized network has a universal information aggregator (a server, or master node) that directly communicates to each of the agents and therefore can coordinate the optimization process. In a decentralized network, all the nodes are equal, the server node is not present, and each agent only communicates to its immediate neighbors.
We assume that each of the nodes locally holds its objective and can compute its value at given points, i. e. has access to zero-order oracle. Zero-order information is used when the gradient of the function is costly, not possible to compute or when the function is not differentiable. For example, in reinforcement learning one needs to generate a trajectory to evaluate the current policy. This policy evaluation process can be interpreted as the computation of the function value. We propose an approach that uses a smoothing technique, i. e., applies a first-order method to the smoothed version of the initial function. It can be shown that the stochastic gradient of the smoothed function can be viewed as a random two-point gradient approximation of the initial function. Smoothing approaches have been studied for distributed zero-order minimization, and our paper generalizes the smoothing technique on SPPs.

Keywords: convex optimization, distributed optimization.

UDC: 519.8

Received: 19.02.2023
Accepted: 23.02.2023

Language: English

DOI: 10.20537/2076-7633-2023-15-2-469-480