Abstract:
Experiments on Human3.6M and on our own real data confirmed the effectiveness of the proposed approach based on FNet blocks, compared to the traditional approach based on LSTM. The proposed algorithm matches the accuracy of advanced models, but outperforms them in terms of speed and uses less computational resources and can be applied in collaborative robotic solutions. The problem of predicting the position of a person on future frames of a video stream is solved and in-depth experimental studies on the application of traditional and SOTA blocks for this task are carried out. An original architecture of KeyFNet and its modifications based on transform blocks is presented, which is able to predict coordinates in the video stream for 30, 60, 90 and 120 frames ahead with high accuracy. The novelty lies in the application of a combined algorithm based on multiple FNet blocks with fast Fourier transform as an attention mechanism concatenating the coordinates of key points.
Keywords:prediction key points, transformers, collaborative robotic systems, deep learning.
UDC:
004.93
Presented:A. I. Avetisyan Received: 02.09.2023 Revised: 15.09.2023 Accepted: 24.10.2023