Title:
|
Efficient combinatorial algorithms for DNA microarray design
|
The advent of efficient genome sequencing tools and high-throughput experimental
biotechnology has led to enonnous progress in the life science. DNA microarray
is among the most important innovations. It allows to measure the expression for
thousands of genes simultaneously by analysing the hybridisation data. Such measurements
have been proved to be invaluable in understanding the development of
diseases such as cancer. However, the analysis of data is non-trivial since the hybridisation
data relies on the quality of DNA microarray. High quality DNA microarray
will lead to more efficient hybridisation and stronger signal.and reliability.
The reliability of data is essential. Thus, the development of novel algorithms and
techniques for DNA microarray design is crucial.
This thesis considers a number of combinatorial issues in selecting, placing,
and synthesising probes during the DNA microarray design process. A probe is
a specific sequence of single-stranded DNA or RNA, typically labelled with a radioactive
or fluorescent tag, which is designed to bind to, and thereby identify, a
particular segment of DNA (or RNA). The probe selection problem we studied is
to find for each gene sequence a unique probe such that every gene in the given
dataset can be identified. However, due to homology, sometimes a gene does not
have a unique probe, then we use a small number of non-unique probes to identify
a gene. The challenge of the problem is that there are many candidate probes in
a gene sequence and we have to find the right one (or a small subset) efficiently.
A randomised probe selection algorithm for DNA microarray design is proposed.
The algorithm overcomes some existing algorithms demanding optimal probes by exhaustive search. \Ve implement the randomised probe selection algorithm and
develop a probe selection software RANDPS. Investigations using several real-life
microarray datasets show that algorithm is able to find high quality probes.
Nevertheless, the number of the probes selected might be too large for placing
in a single microarray, thus minimising the number of probes is an important objective,
since it is proportional to the cost of the microarray experiment. Therefore,
we investigate the string barcoding problem in which a set of non-unique probes is
given and the probes have to be chosen from the given set of probes. The objective is
to use an appropriate combination of probes with minimum cardinality such that all
genes in the dataset can be distinguished. An almost optimal O(nlSllog3 n)-time
approximation algorithm for the considered problem is presented. The approximation
procedure is a modification of the algorithm due to Berman et a1. [l0] which
obtains the best possible approximation ratio (1 + In n). The improved time complexity
is a direct consequence of more careful management of processed sets, use
of several specialised graph and string data structures, as well as tighter time complexity
analysis based on an amortised argument.
After probes are selected, they are then synthesised on the microarrays by using
a light-directed chemical process in which unintended illumination may contaminate
the quality of the microarray experiments. Border length is a measure of the
amount of unwanted illumination and the objective of this problem is to minimise
the total border length during probe synthesis process. This problem is believed to
be NP-hard and approximation of the BMP problem in asynchronous synthesis is
studied. As far as we know, this is the first result with proved performance guarantee.
The main result is an O(vnlog2 n)-approximation, where n is the number of
probes to be synthesised. In the case where the placement is given in advance, we
show that the problem is O(10g2 n)-approximable. A related problem called agreement
maximisation problem (MAP) is also considered in this chapter. In contrast to
BMp, we show that MAP admits a constant approximation even when placement is
not given in advance.
Supplied by The British Library - 'The world's knowledge'
|