Abstract:
For the two-armed bandit problem considered on a known finite time segment $T$, a strategy with a priori determined learning time is proposed. Based on the loss balance equation, its exact asymptotic estimate is established, which is found to be of order $T^{2/3}$. For near distributions, the estimate changes: for a Bernoullian two-armed bandit, the learning time in this case approximately equals $T/3$.