A. V. Kolnogorov, “Invariant description of control in a Gaussian one-armed bandit problem”, Vestnik YuUrGU. Ser. Mat. Model. Progr., 2024, Volume 17, Issue 1,Pages <nobr>27

Mathematical Modelling

Invariant description of control in a Gaussian one-armed bandit problem

A. V. Kolnogorov

Yaroslav-the-Wise Novgorod State University, Veliky Novgorod, Russian Federation

Abstract: We consider the one-armed bandit problem in application to batch data processing if there are two alternative processing methods with different efficiencies and the efficiency of the second method is a priori unknown. During the processing, it is necessary to determine the most effective method and ensure its preferential use. Processing is performed in batches, so the distributions of incomes are Gaussian. We consider the case of a priori unknown mathematical expectation and the variance of income corresponding to the second action. This case describes a situation when the batches themselves and their number have moderate or small volumes. We obtain recursive equations for computing the Bayesian risk and regret, which we then present in an invariant form with a control horizon equal to one. This makes it possible to obtain the estimates of Bayesian and minimax risk that are valid for all control horizons multiples to the number of processed batches.

Keywords: one-armed bandit, batch processing, Bayesian and minimax approaches, invariant description.

UDC: 519.244, 519.83

MSC: 62C10, 62L05, 91A35

Received: 22.11.2023

Language: English

DOI: 10.14529/mmp240103