Sufficient Classes of Strategies in Discrete Dynamic Programming I: Decomposition of Randomized Strategies and Embedded Models
1987; Society for Industrial and Applied Mathematics; Volume: 31; Issue: 4 Linguagem: Inglês
10.1137/1131088
ISSN1095-7219
Autores Tópico(s)Auction Theory and Applications
ResumoPrevious article Next article Sufficient Classes of Strategies in Discrete Dynamic Programming I: Decomposition of Randomized Strategies and Embedded ModelsE. A. FainbergE. A. Fainberghttps://doi.org/10.1137/1131088PDFBibTexSections ToolsAdd to favoritesExport CitationTrack CitationsEmail SectionsAbout[1] N. V. Krylov, The construction of an optimal strategy for a finite controlled chain, Theory Probab. Appl., 10 (1965), 45–54 10.1137/1110004 LinkGoogle Scholar[2] David Blackwell, Discrete dynamic programming, Ann. Math. Statist., 33 (1962), 719–726 26:7449 0133.12906 CrossrefGoogle Scholar[3] David Blackwell, Discounted dynamic programming, Ann. Math. Statist., 36 (1965), 226–235 30:3749 0133.42805 CrossrefGoogle Scholar[4] E. B. Dynkin and , A. A. Yushkevich, Controlled Markov processes, Grundlehren der Mathematischen Wissenschaften [Fundamental Principles of Mathematical Sciences], Vol. 235, Springer-Verlag, Berlin, 1979xvii+289 80k:90037 0426.60063 CrossrefGoogle Scholar[5] I. M. Sonin and , E. A. Fainberg, Sufficient classes of strategies in countable controllable Markov chains with sum criterion, Dokl. Akad. Nauk SSSR, 275 (1984), 806–809, (In Russian.) 85e:90064 Google Scholar[6] T. P. Hill, On the existence of good Markov strategies, Trans. Amer. Math. Soc., 247 (1979), 157–176 80b:60063 0399.60046 CrossrefGoogle Scholar[7] H. Everett, Recursive gamesContributions to the theory of games, vol. 3, Annals of Mathematics Studies, no. 39, Princeton University Press, Princeton, N. J., 1957, 47–78 19,1025c 0078.32802 Google Scholar[8] R. Ya. Chitashvili, On the existence of $\varepsilon$-optimal stationary policies for a controllable Markov chain, Soobschch. Akad. Nauk Gruzin SSR, 83 (1976), 549–552, (In Russian.) 0361.60023 Google Scholar[9] A. A. Yushkevich and , R. Ya. Chitashvili, Controllable random sequences and Markov chains, Uspekhi Mat. Nauk, 37 (1982), 213–242, (In Russian.) 85c:93092 Google Scholar[10] Donald Ornstein, On the existence of stationary optimal strategies, Proc. Amer. Math. Soc., 20 (1969), 563–569 40:6970 0181.47103 CrossrefGoogle Scholar[11] Stephen Demko and , Theodore P. Hill, Decision processes with total-cost criteria, Ann. Probab., 9 (1981), 293–301 83f:90123 0457.60027 CrossrefGoogle Scholar[12] Jan van der Wal, On stationary strategies in countable state total reward Markov decision processes, Math. Oper. Res., 9 (1984), 290–300 86e:90126 0544.90099 CrossrefGoogle Scholar[13] Ralph E. Strauch, Negative dynamic programming, Ann. Math. Statist., 37 (1966), 871–890 33:2456 0144.43201 CrossrefGoogle Scholar[14] Jan van der Wal, On uniformly nearly-optimal Markov strategiesOperations research proceedings 1982 (Frankfurt, 1982), Springer, Berlin, 1983, 461–467 838 769 0526.90096 CrossrefGoogle Scholar[15] E. A. Fainberg and , I. M. Sonin, Stationary and Markov policies in countable state dynamic programmingProbability theory and mathematical statistics (Tbilisi, 1982), Lecture Notes in Math., Vol. 1021, Springer, Berlin, 1983, 111–129 85h:90132 0541.90092 CrossrefGoogle Scholar[16] I. M. Sonin, The existence of a uniformly nearly-optimal Markov strategy for a controlled Markov chain with countable state space, Models and Methods of Stochastic Optimization, TsEMI AN SSSR, 1984, 213–232, (In Russian) Google Scholar[17] Jan van der Wal and , Jaap Wessels, On the use of information in Markov decision processes, Statist. Decisions, 2 (1984), 1–21 85g:90120 0529.90094 Google Scholar[18] Cyrus Derman and , Ralph E. Strauch, A note on memoryless rules for controlling sequential control processes, Ann. Math. Statist., 37 (1966), 276–278 32:2250 0138.13604 CrossrefGoogle Scholar[19] I. I. Gikhman and , A. V. Skorokhod, Controlled Random Processes, Springer–Verlag, New York, 1979 Google Scholar[20] E. A. Fainberg, Nonrandomized Markov and semi-Markov strategies in dynamic programming, Theory Probab. Appl., 27 (1982), 116–126 10.1137/1127010 0499.60093 LinkGoogle Scholar[21] J. van Nunen and , J. Wessels, Markov decision processes with unbounded rewards, Markov decision theory (Proc. Adv. Sem., Amsterdam, 1976), Math. Centr., Amsterdam, 1977, 1–24. Math. Centre Tracts, No. 93 58:9293 0391.90094 Google Scholar[22] E. A. Fainberg and , I. M. Sonin, Persistently nearly optimal strategies in stochastic dynamic programmingStatistics and control of stochastic processes (Moscow, 1984), Transl. Ser. Math. Engrg., Optimization Software, New York, 1985, 69–101, (Steklov Seminar) 87c:90236 0576.90095 Google Scholar[23] Jacques Neveu, Mathematical foundations of the calculus of probability, Translated by Amiel Feinstein, Holden-Day Inc., San Francisco, Calif., 1965xiii+223 33:6660 0137.11301 Google Scholar[24] P. A. Meyer, Probability and potentials, Blaisdell Publishing Co. Ginn and Co., Waltham, Mass.-Toronto, Ont.-London, 1966xiii+266 34:5119 0138.10401 Google Scholar[25] N. V. Krylov, Once more about the connection between elliptic operators and Itô's stochastic equationsStatistics and control of stochastic processes (Moscow, 1984), Transl. Ser. Math. Engrg., Optimization Software, New York, 1985, 214–229, (Steklov Seminar) 87j:60088 0568.60060 Google Scholar[26] K. M. van Hee, Markov strategies in dynamic programming, Math. Oper. Res., 3 (1978), 37–41 58:4348 0431.90077 CrossrefGoogle Scholar[27] E. A. Fainberg, Controlled Markov processes with arbitrary numerical criteria, Theory. Probab. Appl., 27 (1982), 486–503 0515.90076 LinkGoogle Scholar[28] Robert P. Kertz, Renewal plans and persistent optimality in countably additive gambling, Math. Oper. Res., 7 (1982), 361–382 84g:90103 0498.90080 CrossrefGoogle Scholar Previous article Next article FiguresRelatedReferencesCited ByDetails MDPs with setwise continuous transition probabilitiesOperations Research Letters, Vol. 49, No. 5 | 1 Sep 2021 Cross Ref Solving average cost Markov decision processes by means of a two-phase time aggregation algorithmEuropean Journal of Operational Research, Vol. 240, No. 3 | 1 Feb 2015 Cross Ref A two-phase time aggregation algorithm for average cost Markov decision processes2012 American Control Conference (ACC) | 1 Jun 2012 Cross Ref Splitting Randomized Stationary Policies in Total-Reward Markov Decision ProcessesMathematics of Operations Research, Vol. 37, No. 1 | 1 Feb 2012 Cross Ref Time aggregated Markov decision processes via standard dynamic programmingOperations Research Letters, Vol. 39, No. 3 | 1 May 2011 Cross Ref Exact finite approximations of average-cost countable Markov decision processesAutomatica, Vol. 44, No. 6 | 1 Jun 2008 Cross Ref Total Reward CriteriaHandbook of Markov Decision Processes | 1 Jan 2002 Cross Ref A Note on the Existence of Optimal Policies in Total Reward Dynamic Programs with Compact Action SetsMathematics of Operations Research, Vol. 25, No. 4 | 1 Nov 2000 Cross Ref Continuity of Optimal Values and Solutions for Control of Markov Chains with ConstraintsSIAM Journal on Control and Optimization, Vol. 38, No. 4 | 26 July 2006AbstractPDF (192 KB)Notes on equivalent stationary policies in Markov decision processes with total rewardsMathematical Methods of Operations Research, Vol. 44, No. 2 | 1 Jun 1996 Cross Ref Finite state Markov decision models with average reward criteriaStochastic Processes and their Applications, Vol. 49, No. 1 | 1 Jan 1994 Cross Ref Non-randomized strategies in stochastic decision processesAnnals of Operations Research, Vol. 29, No. 1 | 1 Dec 1991 Cross Ref Optimality of pure strategies in stochastic decision processes29th IEEE Conference on Decision and Control | 1 Jan 1990 Cross Ref Sufficient Classes of Strategies in Discrete Dynamic Programming. II: Locally Stationary StrategiesTheory of Probability & Its Applications, Vol. 32, No. 3 | 17 July 2006AbstractPDF (1380 KB) Volume 31, Issue 4| 1987Theory of Probability & Its Applications563-742 History Submitted:22 February 1984Published online:17 July 2006 InformationCopyright © 1987 Society for Industrial and Applied MathematicsPDF Download Article & Publication DataArticle DOI:10.1137/1131088Article page range:pp. 658-668ISSN (print):0040-585XISSN (online):1095-7219Publisher:Society for Industrial and Applied Mathematics
Referência(s)