RUS  ENG
Full version
JOURNALS // News of the Kabardino-Balkarian Scientific Center of the Russian Academy of Sciences // Archive

News of the Kabardino-Balkarian Scientific Center of the Russian Academy of Sciences, 2025 Volume 27, Issue 2, Pages 86–102 (Mi izkab938)

Computer science and information processes

On the application of reinforcement learning in the task of choosing the optimal trajectory

M. G. Gorodnichev

Moscow Technical University of Communications and Informatics, 111024, Russia, Moscow, 8A Aviamotornaya street

Abstract: This paper reviews state-of-the-art reinforcement learning methods, with a focus on their application in dynamic and complex environments. The study begins by analysing the main approaches to reinforcement learning such as dynamic programming, Monte Carlo methods, time-difference methods and policy gradients. Special attention is given to the Generalised Adversarial Imitation Learning (GAIL) methodology and its impact on the optimisation of agents' strategies. A study of model-free learning is presented and criteria for selecting agents capable of operating in continuous action and state spaces are highlighted. The experimental part is devoted to analysing the learning of agents using different types of sensors, including visual sensors, and demonstrates their ability to adapt to the environment despite resolution constraints. A comparison of results based on cumulative reward and episode length is presented, revealing improved agent performance in the later stages of training. The study confirms that the use of simulated learning significantly improves agent performance by reducing time costs and improving decision-making strategies. The present work holds promise for further exploration of mechanisms for improving sensor resolution and fine-tuning hyperparameters.

Keywords: Keywords: reinforcement learning, intelligent agents, optimal trajectory, highly automated vehicles, policy-based learning, actor-critic architectures, simulated learning, sensors, continuous states, discrete states, PPO, SAC

UDC: 004.852

MSC: 68T07

Received: 25.03.2025
Revised: 26.03.2025
Accepted: 09.04.2025

DOI: 10.35330/1991-6639-2025-27-2-86-102



Bibliographic databases:


© Steklov Math. Inst. of RAS, 2025