Title:

Importance sampling for stochastic programming

Stochastic programming models are largescale optimization problems that are used to facilitate decisionmaking under uncertainty. Optimization algorithms for such problems need to evaluate the exptected future costs of current decisions, often referred to as the recourse function. In practice, this calculation is computationally difficult as it requires the evaluation of a multidimensional integral whose integrand is an optimization problem. In turn, the recourse function has to be estimated using techniques such as scenario trees or Monte Carlo methods, both of which require numerous function evaluations to produce accurate results for largescale problems with multiple periods and highdimensional uncertainty. In this thesis, we introduce an Importance Sampling framework for stochastic programming that can produce accurate estimates of the recourse function using a small number of samples. Previous approaches for importance sampling in stochastic programming were limited to problems where the uncertainty was modelled using discrete random variables, and the recourse function was additively separable in the uncertain dimensions. Our framework avoids these restrictions by pairing Markov Chain Monte Carlo methods with Kernel Density Estimation algorithms to build a nonparametric Importance Sampling distribution, which can then be used to produce a lowvariance estimate of the recourse function. We demonstrate the increased accuracy and efficiency of our approach using variants of wellknown multistage stochastic programming problems. Our numerical results show that our framework produces more accurate estimates of the optimal value of stochastic programming models, especially for problems with moderatetohigh variance distributions or rareevent distributions. For example, in some applications, we found that if the random variables are drawn from a rareevent distribution, our proposed algorithm can achieve four times reduction in the mean square error and variance given by other existing methods (e.g.: SDDP with Crude Monte Carlo or SDDP with Quasi Monte Carlo method) for the same number of samples. Or when the random variables are drawn from the high variance distribution, our proposed algorithm can reduce the variance averagely by two times compared to the results obtained by other methods for approximately the same level of mean square error and a fixed number of samples. We use our proposed algorithm to solve a capacity expansion planning problem in the electric power industry. The model includes the unit commitment problem and maintenance scheduling. It allows the investors to make optimal decisions on the capacity and the type of generators to build in order to minimize the capital cost and operating cost over a long period of time. Our model computes the optimal schedule for each of the generators while meeting the demand and respecting the engineering constraints of each generator. We use an aggregation method to group generators of similar features, in order to reduce the problem size. The numerical experiment shows that by clustering the generators of the same technology with similar size together and apply the SDDP algorithm with our proposed sampling framework on this simplified formulation, we are able to solve the problem using only one fourth the amount of time to solve the original problem by conventional algorithms. The speedup is achieved without a significant reduction in the quality of the solution.
