Use this URL to cite or link to this record in EThOS:
Title: On variable selection in high dimensions, segmentation and multiscale time series
Author: Baranowski, Rafal
ISNI:       0000 0004 5989 6665
Awarding Body: London School of Economics and Political Science (LSE)
Current Institution: London School of Economics and Political Science (University of London)
Date of Award: 2016
Availability of Full Text:
Access from EThOS:
Full text unavailable from EThOS. Please try the link below.
Access from Institution:
In this dissertation, we study the following three statistical problems. First, we consider a high-dimensional data framework, where the number of covariates potentially affecting the response is large relatively to the sample size. In this setting, some of the covariates are observed to exhibit an impact on the response spuriously. Addressing this issue, we rank the covariates according to their impact on the response and use certain subsampling scheme to identify the covariates which non-spuriously appear at the top of the ranking. We study the conditions under which such set is unique and show that, with high probability, it can be recovered from the data by our procedure, for rankings based on measures commonly used in statistics. We illustrate its good practical performance in an extensive comparative simulation study and on microarray data. Second, we propose a generic approach to the problem of detecting the unknown number of features in the time series of interest, such as changes in trend or jumps in the mean, occurring at the unknown locations in time. Those locations naturally imply the decomposition of the data into segments of homogeneity, the knowledge of which is useful in e.g. estimation of the mean of the series. We provide a precise description of the type of features we are interested in and, in two important scenarios, demonstrate that our methodology enjoys appealing theoretical properties. We show that the performance of our proposal matches or surpasses the state of the art in the scenarios tested and present its applications on three real datasets: oil price log-returns, temperature anomalies data and the UK House Price Index Finally, we introduce a class of univariate multiscale time series models and propose an estimation procedure to fit those models from the data. We demonstrate that our proposal, with a large probability, correctly identifies important timescales, under the framework in which the largest timescale in the model diverges with the sample size. A good empirical performance of the method is illustrated in an application to high-frequency financial returns for stocks listed on New York Stock Exchange. For all proposed methods, we provide efficient and publicly-available computer implementations.
Supervisor: Not available Sponsor: Not available
Qualification Name: Thesis (Ph.D.) Qualification Level: Doctoral
EThOS ID:  DOI: Not available
Keywords: HA Statistics