Title:

Some theoretical essays on functional data classification

Functional data analysis is a fastgrowing research area in statistics, dealing with statistical analysis of infinitedimensional (functional) data. For many pattern recognition problems with finitedimensional data there usually exists a solid theoretical foundation, for example, it is known under which assumptions various classifiers have desirable theoretical properties, such as consistency. Therefore, a natural interest is to extend the theory to the setting of infinitedimensional data. The thesis is written in two directions: one is when we observe full curves, and the other is when we observe sparse and irregular curves. In the first direction, the main goal is to give a justification for a logistic classifier, where only the projection of the parameter function on some subspace is estimated via maximum quasilikelihood and the rest of its coordinates are set to zero. This is preceded with studying the problem of detecting sample point separation in logistic regression–the case in which the maximum quasilikelihood estimate of the model parameter does not exist or is not unique. In the other direction, a problem of extending sparsely and irregularly sampled functional data to full curves is considered so that potentially the theory from the first research direction could be applied in the future. There are several contributions of this thesis. First, it is proved that the separating hyperplane can be found from a finite set of candidates, and an upper bound of the probability of point separation is given. Second, the assumptions under which the logistic classifier is consistent are established, although simulation studies reveal that some assumptions are not necessary and may be relaxed. Thirdly, the thesis proposes a collaborative curve extension method, which is proven to be consistent under certain assumptions.
