Abstract:
Consideration is given to the model of a stochastic system comprised of a finite number of parallel independently running queues with heterogeneous servers, a single flow of independent jobs, and a single dispatcher which possesses full a priori information about the system's parameters and its initial state. At any moment, the dispatcher does not have any feedback from the queues but can memorize its previous routing decisions and time instants at which the decisions were made. Under quite general assumptions about the jobs' interarrival and jobs' size distributions and queues' scheduling, the (policy gradient) algorithm is proposed which allows one to locate the global optimum of the job's stationary mean sojourn or waiting time. The algorithm is based on the assumption that one can reach the neighborhood of the global optimum by applying the dispatching policy with a finite memory.
Keywords:parallel service systems, dispatching, control under incomplete observations, program control.