Use this URL to cite or link to this record in EThOS:
Title: Identification of antigen-specific patterns from high-dimensional sequencing data
Author: Sun, Yuxin
ISNI:       0000 0004 9359 4898
Awarding Body: UCL (University College London)
Current Institution: University College London (University of London)
Date of Award: 2020
Availability of Full Text:
Access from EThOS:
Full text unavailable from EThOS. Please try the link below.
Access from Institution:
T cells recognize antigens using a diverse set of antigen-specific T-cell receptors (TCRs) on the surface. This poses two challenges for studying TCRs that respond to a given antigen. First, the enormous diversity of the TCR repertoire creates an ultra-high dimensional feature space; second, TCRs that respond to an antigen are often correlated. This thesis aims to develop efficient machine learning algorithms concerning both problems for feature selection from high-dimensional feature spaces. Our research concerns two subproblems: identification of antigen-enriched sequence motifs within the CDR3 region of TCRs and antigen-enriched entire TCR sequences. We apply a string kernel and a Fisher kernel to represent subsequences and develop fast algorithms to learn antigen-specific subsequences from graph-represented features. Both fixed-length and varying-length subsequences from mouse samples are selected with high efficiency and accuracy. Our results also suggest that short subsequences are found at specific positions, which may correspond to the actual interacting regions between TCR and MHC-peptide complex. We further develop fast algorithms to solve exclusive group Lasso and provide a novel methodology to select entire TCR sequences that are relevant to specific antigens. Our solution concerns a notoriously difficult problem in feature selection to select highly correlated features. Experiments on synthetic data show good performance under various correlation settings. The proposed algorithms are also validated on real-world data to select a sparse set of entire TCRs with high accuracy.
Supervisor: Not available Sponsor: Not available
Qualification Name: Thesis (Ph.D.) Qualification Level: Doctoral
EThOS ID:  DOI: Not available