RUS  ENG
Full version
JOURNALS // Computer Research and Modeling // Archive

Computer Research and Modeling, 2024 Volume 16, Issue 7, Pages 1601–1619 (Mi crm1237)

SPECIAL ISSUE

Review of algorithmic solutions for deployment of neural networks on lite devices

S. A. Khan, S. Shulepina, D. Shulepin, R. A. Lukmanov

Innopolis University, 1 Universitetskaya st., Innopolis, 420500, Russia

Abstract: In today’s technology-driven world, lite devices like Internet of Things (IoT) devices and microcontrollers (MCUs) are becoming increasingly common. These devices are more energyefficient and affordable, often with reduced features compared to the standard versions such as very limited memory and processing power for typical machine learning models. However, modern machine learning models can have millions of parameters, resulting in a large memory footprint. This complexity not only makes it difficult to deploy these large models on resource constrained devices but also increases the risk of latency and inefficiency in processing, which is crucial in some cases where real-time responses are required such as autonomous driving and medical diagnostics. In recent years, neural networks have seen significant advancements in model optimization techniques that help deployment and inference on these small devices. This narrative review offers a thorough examination of the progression and latest developments in neural network optimization, focusing on key areas such as quantization, pruning, knowledge distillation, and neural architecture search. It examines how these algorithmic solutions have progressed and how new approaches have improved upon the existing techniques making neural networks more efficient. This review is designed for machine learning researchers, practitioners, and engineers who may be unfamiliar with these methods but wish to explore the available techniques. It highlights ongoing research in optimizing networks for achieving better performance, lowering energy consumption, and enabling faster training times, all of which play an important role in the continued scalability of neural networks. Additionally, it identifies gaps in current research and provides a foundation for future studies, aiming to enhance the applicability and effectiveness of existing optimization strategies.

Keywords: quantization, neural architecture search, knowledge distillation, pruning, reinforcement learning, model compression

UDC: 004.8

Received: 27.10.2024
Revised: 16.11.2024
Accepted: 25.11.2024

Language: English

DOI: 10.20537/2076-7633-2024-16-7-1601-1619



© Steklov Math. Inst. of RAS, 2025