Abstract:
The existence of a stationary average reward $\varepsilon$-optimal policy is proved for discrete time Markov decision chains with finitely many states, compact sets of actions, continuous transition functions and upper semicontinuous reward functions.