Title:

Statistical inference in highdimensional matrix models

Matrix models are ubiquitous in modern statistics. For instance, they are used in finance to assess interdependence of assets, in genomics to impute missing data and in movie recommender systems to model the relationship between users and movie ratings. Typically such models are either highdimensional, meaning that the number of parameters may exceed the number of data points by many orders of magnitudes, or nonparametric in the sense that the quantity of interest is an infinite dimensional operator. This leads to new algorithms and also to new theoretical phenomena that may occur when estimating a parameter of interest or functionals of it or when constructing confidence sets. In this thesis, we will exemplarily consider three such matrix models and develop statistical theory for them: Matrix completion, Principal Component Analysis (PCA) with Gaussian data and transition operators of Markov chains. We start with matrix completion and investigate the existence of adaptive confidence sets in the 'Bernoulli' and 'traceregression' models. In the 'Bernoulli' model we show that adaptive confidence sets do not exist when the variance of the errors is unknown, whereas we give an explicit construction in the 'traceregression' model. Finally, in the known variance case, we show that adaptive confidence sets do also exist in the 'Bernoulli' model based on a testing argument. Next, we consider PCA in a Gaussian observation model with complexity measured by the effective rank, the reciprocal of the percentage of variance explained by the first principal component. We investigate estimation of linear functionals of eigenvectors and prove BerryEssen type bounds. Due to the highdimensionality of the problem we discover a new phenomenon: The plugin estimator based on the sample eigenvector can have nonnegligible bias and hence may be not √nconsistent anymore. We show how to debias this estimator, achieving √nconvergence rates, and prove exact matching minimax lower bounds. Finally, we consider nonparametric estimation of the transition operator of a Markov chain and its transition density. We assume that the singular values of the transition operator decay exponentially. For example, this assumption is fulfilled by discrete, low frequency observations of periodised, reversible stochastic differential equations. Using penalization techniques from low rank matrix estimation we develop a new algorithm and show improved convergence rates.
