RUS  ENG
Full version
JOURNALS // Problemy Peredachi Informatsii // Archive

Probl. Peredachi Inf., 2000 Volume 36, Issue 4, Pages 117–127 (Mi ppi501)

This article is cited in 2 papers

Automata Theory

On Optimal Prior Learning Time in the Two-Armed Bandit Problem

A. V. Kolnogorov


Abstract: For the two-armed bandit problem considered on a known finite time segment $T$, a strategy with a priori determined learning time is proposed. Based on the loss balance equation, its exact asymptotic estimate is established, which is found to be of order $T^{2/3}$. For near distributions, the estimate changes: for a Bernoullian two-armed bandit, the learning time in this case approximately equals $T/3$.

UDC: 621.391.1-503.5

Received: 22.06.1999
Revised: 24.07.2000


 English version:
Problems of Information Transmission, 2000, 36:4, 387–396

Bibliographic databases:


© Steklov Math. Inst. of RAS, 2025