RUS  ENG
Full version
JOURNALS // Doklady Rossijskoj Akademii Nauk. Mathematika, Informatika, Processy Upravlenia // Archive

Dokl. RAN. Math. Inf. Proc. Upr., 2025 Volume 527, Pages 459–470 (Mi danma701)

SPECIAL ISSUE: ARTIFICIAL INTELLIGENCE AND MACHINE LEARNING TECHNOLOGIES

Mind and motion aligned: a joint evaluation ISAACSIM benchmark for task planning and low-level policies in mobile manipulation

N. È. Kachaeva, A. N. Spiridonova, A. S. Gorodetskya, K. F. Muravievbc, N. S. Oskolkovc, A. Narendrac, V. I. Shakhuroad, D. A. Makarovbc, A. I. Panovac, P. D. Fedotovaef, A. K. Kovalevac

a Artificial Intelligence Research Institute, Moscow
b Federal Research Center "Computer Science and Control" of Russian Academy of Sciences, Moscow
c Moscow Institute of Physics and Technology (National Research University), Dolgoprudny, Moscow Region
d Lomonosov Moscow State University
e SberRoboticsCenter, Moscow
f Skolkovo Institute of Science and Technology

Abstract: Benchmarks are crucial for evaluating progress in robotics and embodied AI. However, a significant gap exists between benchmarks designed for high-level language instruction following, which often assume perfect low-level execution, and those for low-level robot control, which rely on simple, one-step commands. This disconnect prevents a comprehensive evaluation of integrated systems where both task planning and physical execution are critical. To address this, we propose Kitchen-R, a novel benchmark that unifies the evaluation of task planning and low-level control within a simulated kitchen environment. Built as a digital twin using the Isaac Sim simulator and featuring more than 500 complex language instructions, Kitchen-R supports a mobile manipulator robot. We provide baseline methods for our benchmark, including a task-planning strategy based on a vision-language model and a low-level control policy based on diffusion policy. We also provide a trajectory collection system. Our benchmark offers a flexible framework for three evaluation modes: independent assessment of the planning module, independent assessment of the control policy, and, crucially, an integrated evaluation of the whole system. Kitchen-R bridges a key gap in embodied AI research, enabling more holistic and realistic benchmarking of language-guided robotic agents.

Keywords: benchmark, robotics, embodied AI, task planning, mobile manipulation, simulation.

UDC: 004.9

Received: 21.08.2025
Accepted: 28.09.2025

DOI: 10.7868/S2686954325070392



Bibliographic databases:


© Steklov Math. Inst. of RAS, 2025