Title:

Some contributions to Markov decision processes

In a nutshell, this thesis studies discretetime Markov decision processes (MDPs) on Borel Spaces, with possibly unbounded costs, and both expected (discounted) total cost and longrun expected average cost criteria. In Chapter 2, we systematically investigate a constrained absorbing MDP with expected total cost criterion and possibly unbounded (from both above and below) cost functions. We apply the convex analytic approach to derive the optimality and duality results, along with the existence of an optimal finite mixing policy. We also provide mild conditions under which a general constrained MDP model with stateactiondependent discount factors can be equivalently transformed into an absorbing MDP model. Chapter 3 treats a more constrained absorbing MDP, as compared with that in Chapter 2. The dynamic programming approach is applied to a reformulated unconstrained MDP model and the optimality results are obtained. In addition, the correspondence between policies in the original model and the reformulated one is illustrated. In Chapter 4, we attempt to extend the dynamic programming approach for standard MDPs with expected total cost criterion to the case, where the (iterated) coherent risk measure of the cost is taken as the performance measure to be minimized. The cost function under our consideration is allowed to be unbounded from the below, and possibly arbitrarily unbounded from the above. Under a fairly weak version of continuitycompactness conditions, we derive the optimality results for both the finite and infinite horizon cases, and establish value iteration as well as policy iteration algorithms. The standard MDP and the iterated conditional valueatrisk of the cost function are illustrated as two examples. Chapter 5 and 6 tackle MDPs with longrun expected average cost criterion. In Chapter 5, we consider a constrained MDP with possibly unbounded (from both above and below) cost functions. Under Lyapunovlike conditions, we show the sufficiency of stable policies to the concerned constrained problem. Furthermore, we introduce the corresponding space of performance vectors and manage to characterize each of its extreme points with a deterministic stationary policy. Finally, the existence of an optimal finite mixing policy is justified. Chapter 6 concerns an unconstrained MDP with the cost functions unbounded from the below and possibly arbitrarily unbounded from the above. We provide a detailed discussion on the issue of sufficient policies in the denumerable case, establish the average cost optimality inequality (ACOI) and show the existence of an optimal deterministic stationary policy. In Chapter 7, an inventoryproduction system is taken as an example of realworld applications to illustrate the main results in Chapter 2 and 5.
