A. V. Kolnogorov, “On Optimal Prior Learning Time in the Two-Armed Bandit Problem”, Probl. Peredachi Inf., 2000, Volume 36, Issue 4,Pages <nobr>117

This article is cited in 2 papers

Automata Theory

On Optimal Prior Learning Time in the Two-Armed Bandit Problem

A. V. Kolnogorov

Abstract: For the two-armed bandit problem considered on a known finite time segment $T$, a strategy with a priori determined learning time is proposed. Based on the loss balance equation, its exact asymptotic estimate is established, which is found to be of order $T^{2/3}$. For near distributions, the estimate changes: for a Bernoullian two-armed bandit, the learning time in this case approximately equals $T/3$.

UDC: 621.391.1-503.5

Received: 22.06.1999
Revised: 24.07.2000