The gene structure and the polymorphism of the human complement component C4
1. The DNA sequence of the human complement C4A gene from a cosmid clone Cos 3A3 was determined and the complete exon-intron structure elucidated. The 5' flanking region of the C4 gene contains three TATA sequences and a transcriptional enhancer core sequence, which are >200 nucleotides (nt) and 60-70 nt upstream from the CAP site, respectively. The gene consists of 42 exons coding for a precursor protein of 1745 residues. The first exon codes for a 51 nt 5' untranslated sequence, a leader peptide of 19 residues, and the N-terminus of the β chain. The β-α and the α-γ chain junctions are encoded by exons 17 and 34, respectively. The anaphylatoxin C4a and the thiolester site are encoded by phase 1-1 symmetrical exons. Most of the amino acids encoded at the splice junctions are polar or charged. Between exons 10 and 11 is a 6-7 kb intron that is flanked by direct long terminal repeats and may be absent in some C4 genes located at the second C4 locus. The last exon codes for the C-terminus of the γ chain and a 140 bp 3' untranslated sequence. The intergenic region between the C4 gene and its neighbouring 21-hydroxylase (210Hase) gene is ~3028 bp. 2. Eighteen polymorphic amino acids on C4 have been identified through genomic DNA, cDNA and protein sequencing. Fourteen of them are located on the* chain (C4a: 2 changes; C4d: 12 changes). The rest are scattered on the β and the γ chains. There are potential size variations by one residue on the β chain, and by a tripeptide that contains a sulphation site on the α chain. 3. Four common and rare C4 alleles have been cloned from individuals whose C4 proteins were chemically and serologically characterised. Analysis of the sequences at the C4d regions has allowed the identification of the C4A/C4B isotypic residues at positions 1101-6: C4A has the sequence PCPVLD, while C4B has the sequence LSPVIH. Presumably these isotypic residues are the cause of the class-specific, differential chemical reactivates. Moreover, the probable locations for the two Eodgers (Kg) and the six Chido (Ch) antigenic determinants were deduced. The C4B isotypic residues may be involved in the expression of the Ch2 and the Ch4 epitopes, while the C4A isotypic residues may not be related to either of the Eg determinants. 4. Definitive restriction fragment length polymorphisms (RFLPs) representing the exact locations responsible for the isotypicity between C4A and C4B, and for their generally associated Rg1 and Ch1 antigenic determinants, have been designed. In combination with the Taq I polymorphic patterns specific for the C4 and for the 210Hase gene loci, it has been shown that the null allele of the HLA haplotype B44 DR6 C4A 3 C4B QO is not a C4B allele, but probably encodes another C4A 3 allotype at the second C4 locus.