RUS  ENG
Full version
JOURNALS // Avtomatika i Telemekhanika // Archive

Avtomat. i Telemekh., 2025 Issue 1, Pages 80–98 (Mi at16478)

Intellectual Control Systems, Data Analysis

On guaranteed estimate of deviations from the target set in a control problem under reinforcement learning

I. A. Chistiakov

Lomonosov Moscow State University, Faculty of Computational Mathematics and Cybernetics, Moscow, Russia

Abstract: We consider a target control problem of a special form, in which a system of differential equations includes nonlinear terms depending on state variables. We show that reinforcement learning algorithms such as Proximal Policy Optimization (PPO) can be used to find an inexact feedback solution. The chosen strategy is further approximated with a piecewise affine control. Based on the dynamic programming method, an inner estimate of the solvability set is calculated, as well as a corresponding a priori estimate of the distance between a final trajectory point and the target set. To do this, we examine an auxiliary problem for a piecewise linear system with noise and calculate a piecewise quadratic function as an approximate solution of the Hamilton–Jacobi–Bellman equation.

Keywords: nonlinear dynamics, dynamic programming, comparison principle, linearization, piecewise quadratic value function, reinforcement learning, PPO algorithm, solvability set.

Presented by the member of Editorial Board: P. V. Pakshin

Received: 29.08.2023
Revised: 14.10.2024
Accepted: 29.10.2024

DOI: 10.31857/S0005231025010057



© Steklov Math. Inst. of RAS, 2025