RUS  ENG
Full version
JOURNALS // Teoriya Veroyatnostei i ee Primeneniya // Archive

Teor. Veroyatnost. i Primenen., 2019 Volume 64, Issue 3, Pages 442–455 (Mi tvp5303)

This article is cited in 1 paper

Gittins index for simple family of Markov bandit processes with switching cost and no discounting

M. P. Savelov

Novosibirsk State University

Abstract: We consider the multiarmed bandit problem (the problem of Markov bandits) with switching penalties and no discounting in case when state spaces of all bandits are finite. An optimal strategy should have the largest average reward per unit time on an infinite time horizon. For this problem it is shown that an optimal strategy can be specified by a Gittins index under the natural assumption that the switching penalties are nonnegative.

Keywords: multicomponent systems, Gittins index, simple family of alternative Markov bandit processes, multiarmed bandit problem, Markov decision process, controlled Markov processes, long run average return, no discounting, switching penalties, optimal strategy.

Received: 26.03.2019
Accepted: 20.06.2019

DOI: 10.4213/tvp5303


 English version:
Theory of Probability and its Applications, 2019, 64:3, 355–364

Bibliographic databases:


© Steklov Math. Inst. of RAS, 2025