Interpretation of the Caenorhabditis elegans genome sequence data through gene expression patterns
The nematode Caenorhabditis elegans has been studied extensively as a means of understanding development and cellular processes and was the first multicellular organism to have a sequenced genome. A number of C. elegans gene expression patterns have been characterized, using several different experimental approaches, thereby providing a link between the nucleic acid sequence of a gene and the temporal and spatial nature of its expression. A systematic collation and analysis of C. elegans gene expression pattern data revealed a high degree of agreement in the results obtained using the different experimental approaches. During this analysis a group of genes was identified that expressed specifically in one particular cell type, the excretory cell. To develop a strategy to identify c/s-acting regulatory elements responsible for the control of cell type-specific expression, the DNA sequences of the potentially co-regulated excretory cell-expressing genes were analysed using two software packages, MEME and SPEXS. The MEME output contained many DNA motifs but their sequence simplicity suggested that they were unlikely to be genuine regulatory elements. In contrast, the output from SPEXS identified a vast number of more complex motifs. However because SPEXS detects motifs simply based on sequence, without considering biological characteristics of cw-acting elements, no priority was assigned to the identified elements. Therefore a scoring strategy was devised that incorporated different weightings such that motifs occurring with high frequency within 1 kb upstream from the translational start, and with high sequence complexity, were assigned a higher score. To test the effectiveness of the scoring strategy when applied to the SPEXS output, a C. elegans muscle data set was analysed which was known to contain a previously characterized cis-acting element. The element in question was identified suggesting that the scoring strategy worked well. When the strategy was applied to excretory cell data set the highest scoring motif, and therefore most likely candidate cis-acting element, was the motif TTACCGAA. This motif was also detected in test sets containing excretory cell-expressing genes and a data set containing C. briggsae orthologues of C. elegans genes. These results suggest that the scoring strategy is an effective approach to identify c/s-acting elements and the motif TTACCGAA is potentially such an element which mediates excretory cell expression in C. elegans.