Abstract:
We consider application of the two-armed bandit problem to processing a large number $N$ of data where two alternative processing methods can be used. We propose a strategy which at the first stages, whose number is at most $r-1$, compares the methods, and at the final stage applies only the best one obtained from the comparison. We find asymptotically optimal parameters of the strategy and observe that the minimax risk is of the order of $N^\alpha$, where $\alpha=2^{r-1}/(2^r-1)$. Under parallel processing, the total operation time is determined by the number $r$ of stages but not by the number $N$ of data.