Percentile objective criteria in limiting average Markov Control Problems
Filar, Jerzy A
Ross, Keith W
Institute of Electrical and Electronic Engineers
Infinite horizon Markov Control Problems, or Markov Decision Processes (MDP's, for short), have been extensively studied since the 1950's. One of the most commonly considered versions is the so-called "limiting average reward" model. In this model the controller aims to maximize the expected value of the limit-average ("long-run average") of an infinite stream of single-stage rewards or outputs. There are now a number of good algorithms for computing optimal deterministic policies in the limiting average MDP's. In this paper we adopt the point of view that there are many natural situations where the controller is interested in finding a policy that will achieve a sufficiently high long-run average reward, that is, a target level with a sufficiently high probability, that is, a percentile.
Mathematics, Markov Decision Processes
Filar, J.A., Krass, D. and Ross, K.W., 1989. Percentile objective criteria in limiting average Markov Control Problems. Proceedings of the 28th IEEE Conference on Decision and Control, vol. 2, 1273-1276.