Abstract:
The paper is concerned with control of a semi-Markov process where the transient probabilities at time of steps and distributions of time of stay in each state are dependent on an unknown parameter. The control objective is to maximize the mean payoff per time unit. Estimates of the minimal contrast are conducive to design of adaptive control; asymptotic properties of estimates of an unknown parameter which are obtained in control are studied. Sufficient conditions are indicated under which adaptive control converges to the optimal one. The case of complete information is analyzed as a particular case.