This article is cited in
33 papers
Nonrandomized Markov and semi-Markov policies in dynamic programming
E. A. Faĭnberg Moscow
Abstract:
The discrete time infinite horizon Borel state and action spaces non-stationary Markov decision model with the expected total reward criterion is considered. For an arbitrary fixed policy
$\pi$ the following two statements are proved:
a) for an arbitrary initial measure
$\mu$ and for a constant
$K<\infty$ there exists a nonrandomized Markov policy
$\varphi$ such that
\begin{gather*}
w(\mu,\varphi)\ge w(\mu,\pi)\ \text{if}\ w(\mu,\pi)<\infty,
\\
w(\mu,\varphi)\ge K\ \text{if}\ w(\mu,\pi)=\infty,
\end{gather*}
b) for an arbitrary measurable function
$K(x)<\infty$ on the initial state space
$X_0$ there exists a nonrandomized semi-Markov policy
$\varphi'$ such that
\begin{gather*}
w(x,\varphi')\ge w(x,\pi)\ \text{if}\ w(x,\pi)<\infty,
\\
w(x,\varphi')\ge K(x)\ \text{if}\ w(x,\pi)=\infty\ \text{for every}\ x\in X_0.
\end{gather*}
For every policy
$\sigma$ the numbers
$w(\mu,\sigma)$ and
$w(x,\sigma)$ are the values of the criterion for the initial measure
$\mu$ and the initial state
$x$ respectively.
Received: 28.11.1979