RUS  ENG
Full version
JOURNALS // Artificial Intelligence and Decision Making // Archive

Artificial Intelligence and Decision Making, 2023 Issue 3, Pages 98–108 (Mi iipr41)

Analysis of signals, audio and video information

Method for processing photo and video data from camera traps using a two-stage neural network approach

V. A. Efremov, A. V. Leus, D. A. Gavrilov, D. I. Mangazeev, I. V. Kholodnyak, A. S. Radysh, V. A. Zuev, N. A. Vodichev

Moscow Institute of Physics and Technology (National Research University), Dolgoprudny, Moscow Region, Russia

Abstract: The paper proposes a technology for analyzing data from camera traps using two-stage neural network processing. The task of the first stage is to separate empty images from non-empty ones. To solve the problem, a comparative analysis of the YOLOv5, YOLOR, YOLOX architectures was carried out and the most optimal detector model was identified. The task of the second stage is to classify the objects found by the detector. Models such as EfficientNetV2, SeResNet, ResNeSt, ReXNet, ResNet were compared. To train the detector model and the classifier, a data preparation approach was developed, which consists in removing duplicate images from the sample. The method was modified using agglomerative clustering to divide the sample into training, validation, and test. In the task of object detection, the YOLOv5-L6 algorithm was the best with an accuracy of 98.5% on the data set. In the task of classifying the found objects, the ResNeSt-101 architecture was the best of all with a recognition quality of 98.339% on test data.

Keywords: camera trap images, agglomerative clustering, deep convolutional neural networks, detection, classification, two-stage approach.

DOI: 10.14357/20718594230310



Bibliographic databases:


© Steklov Math. Inst. of RAS, 2024