Abstract:
We consider a target control problem of a special form, in which a system of differential equations includes nonlinear terms depending on state variables. We show that reinforcement learning algorithms such as Proximal Policy Optimization (PPO) can be used to find an inexact feedback solution. The chosen strategy is further approximated with a piecewise affine
control. Based on the dynamic programming method, an inner estimate of the solvability set is
calculated, as well as a corresponding a priori estimate of the distance between a final trajectory
point and the target set. To do this, we examine an auxiliary problem for a piecewise linear
system with noise and calculate a piecewise quadratic function as an approximate solution of
the Hamilton–Jacobi–Bellman equation.