Simulation and learning in decision processes.
In this thesis we address the problem of adaptive control in complex stochastic systems when the system parameters are both known and unknown. The type of models we consider are those which, in the full information case, are known as Markov Decision Processes. We introduce versions of two new algorithms, the optimiser and the p-learner. The optimiser is a simulation based method for finding optimal values and optimal
policies when the system parameters are known. The p-learner is an algorithm for learning about the state transition probabilities; we use it in conjunction with the optimiser when the system parameters are unknown. We carefully discuss the choice of different components in the different versions
of the algorithms, and we look at two extended case studies to evaluate their performances over a range of different learning parameters. In each case, we compare the results with that of a deterministic method. We also address the convergence of the solutions generated by the optimiser