RUS  ENG
Full version
JOURNALS // Informatics and Automation // Archive

Informatics and Automation, 2024 Issue 23, volume 5, Pages 1485–1504 (Mi trspy1331)

Robotics, Automation and Control Systems

Implicit understanding: decoding swarm behaviors in robots through deep inverse reinforcement learning

A. Iskandara, A. Hammoudb, B. Kovácsa

a University of Miskolc
b Federal State Budgetary Educational Institution of Higher Education “Kuban State Agrarian University named after I.T. Trubilin”

Abstract: Using reinforcement learning to generate the collective behavior of swarm robots is a common approach. Yet, formulating an appropriate reward function that aligns with specific objectives remains a significant challenge, particularly as the complexity of tasks increases. In this paper, we develop a deep inverse reinforcement learning model to uncover the reward structures that guide autonomous robots in achieving tasks by demonstrations. Deep inverse reinforcement learning models are particularly well-suited for complex and dynamic environments where predefined reward functions may be difficult to specify. Our model can generate different collective behaviors according to the required objectives and effectively copes with continuous state and action spaces, ensuring a nuanced recovery of reward structures. We tested the model using E-puck robots in the Webots simulator to solve two tasks: searching for dispersed boxes and navigation to a predefined position. Receiving rewards depends on demonstrations collected by an intelligent pre-trained swarm using reinforcement learning act as an expert. The results show successful recovery of rewards in both segmented and continuous demonstrations for two behaviors – searching and navigation. By observing the learned behaviors of the swarm by the expert and proposed model, it is noticeable that the model does not merely clone the expert behavior but generates its own strategies to achieve the system’s objectives.

Keywords: deep inverse reinforcement learning, reward function, demonstrations, searching behavior, navigation behavior.

UDC: 006.72

Received: 29.05.2024

Language: English

DOI: 10.15622/ia.23.5.8



© Steklov Math. Inst. of RAS, 2024