Percentile objective criteria in limiting average Markov Control Problems

Loading...
Thumbnail Image
Date
1989
Authors
Filar, Jerzy A
Krass, Dmitry
Ross, Keith W
Journal Title
Journal ISSN
Volume Title
Publisher
Institute of Electrical and Electronic Engineers
Rights
Rights Holder
Abstract
Infinite horizon Markov Control Problems, or Markov Decision Processes (MDP's, for short), have been extensively studied since the 1950's. One of the most commonly considered versions is the so-called "limiting average reward" model. In this model the controller aims to maximize the expected value of the limit-average ("long-run average") of an infinite stream of single-stage rewards or outputs. There are now a number of good algorithms for computing optimal deterministic policies in the limiting average MDP's. In this paper we adopt the point of view that there are many natural situations where the controller is interested in finding a policy that will achieve a sufficiently high long-run average reward, that is, a target level with a sufficiently high probability, that is, a percentile.
Description
Keywords
Mathematics, Markov Decision Processes
Citation
Filar, J.A., Krass, D. and Ross, K.W., 1989. Percentile objective criteria in limiting average Markov Control Problems. Proceedings of the 28th IEEE Conference on Decision and Control, vol. 2, 1273-1276.