A. V. Grabovoy, V. V. Strijov, “Bayesian distillation of deep learning models”, Avtomat. i Telemekh., 2021, Issue 11,Pages <nobr>16

This article is cited in 2 papers

Topical issue (end)

Bayesian distillation of deep learning models

A. V. Grabovoy^a, V. V. Strijov^b

^a Moscow Institute of Physics and Technology, Dolgoprudnyi, Moscow oblast, 141701 Russia
^b Dorodnicyn Computing Centre, Russian Academy of Sciences, Moscow, 119333 Russia

Abstract: We study the problem of reducing the complexity of approximating models and consider methods based on distillation of deep learning models. The concepts of trainer and student are introduced. It is assumed that the student model has fewer parameters than the trainer model. A Bayesian approach to the student model selection is suggested. A method is proposed for assigning an a priori distribution of student parameters based on the a posteriori distribution of trainer model parameters. Since the trainer and student parameter spaces do not coincide, we propose a mechanism for the reduction of the trainer model parameter space to the student model parameter space by changing the trainer model structure. A theoretical analysis of the proposed reduction mechanism is carried out. A computational experiment was carried out on synthesized and real data. The FashionMNIST sample was used as real data.

Keywords: model selection, Bayesian inference, model distillation, local transformation, probability space transformation.

Presented by the member of Editorial Board: A. A. Lazarev

Received: 20.01.2021
Revised: 25.06.2021
Accepted: 30.06.2021

DOI: 10.31857/S0005231021110027