Collagens contain a large number of globular domains in between the regions of triple helical repeats IPR008160 . These domains are involved in binding diverse substrates. One of these domains is found at the C terminus of fibrillar collagens. The exact function of this domain is unknown.
Mutations in fibrillar collagens (types I, II, III, and XI), fibril-associated collagen (type IX), and network-forming collagen (type X) cause a spectrum of diseases of bone, cartilage, and blood vessels.
Hum Mutat. 1997; 9: 300-15
Display abstract
This review summarizes the data on 278 different mutations found to date in the genes for types I, II, III, IX, X, and XI collagens from 317 apparently unrelated patients. A majority (217 mutations; 78% of the total) of the mutations are single-base and either change the codon of a critical amino acid (63%), or lead to abnormal RNA splicing (13%). Most of the amino acid substitutions are those of a bulkier amino acid for the obligatory glycine of the repeating-Gly-X-Y-sequence of the collagen triple helix (155; 56%). Altogether, 26 different mutations (9.4% of the mutations) occur in more than one unrelated individual. The 65 patients in whom the 26 mutations were characterized constitute almost one-fifth (20.5%) of the 317 patients analyzed. The mutations in types I, II, III, IX, X, and XI collagens cause a wide spectrum of diseases of bone, cartilage, and blood vessels, including osteogenesis imperfecta, a variety of chondrodysplasias, types IV and VII of the Ehlers-Danlos syndrome, and, rarely, some forms of osteoporosis, osteoarthritis, and familial aneurysms.
Collagens are typical mosaic proteins containing a number of shuffled domains. These domains have been classified by sequence similarity in order to characterize their structural and functional relationships to other proteins. This analysis provides an overview of homologies of collagen domains. It also reveals two new relationships: (i) a module common to type V, IX, XI, and XII collagens was found to be homologous to the heparin binding domain of thrombospondin; (ii) the modular architecture of a human type VII collagen fragment was identified. Its N-terminal globular domain contains fibronectin type III repeats located adjacent to a Von Willebrand factor type A module. The proposed structural similarities point to analogous subfunctions of the respective domains in otherwise distinct proteins.
A conserved nucleotide sequence, coding for a segment of the C-propeptide, is found at the same location in different collagen genes.
Nucleic Acids Res. 1983; 11: 2733-44
Display abstract
The nucleotide sequence of a segment of the chick alpha 1 type III collagen gene which codes for the C-propeptide was determined and compared with the corresponding sequence in the alpha 1 type I and alpha 2 type I collagen genes. As in the alpha 2 type I gene the coding information for the C-propeptide of the type III collagen gene is subdivided in four exons. Similarly, the amino proximal exon contains sequences for both the carboxy terminal end of the alpha-helical segment of collagen and for the beginning of the C-propeptide in both genes. Therefore, this organization of exons must have been established before these two collagen genes arose by duplication of a common ancestor. In several subsegments the deduced amino acid sequence for the C-propeptide of type III collagen shows a strong homology with the corresponding amino acid sequence in alpha 1 and alpha 2 type I. For one of these homologous amino acid sequences, however, the nucleotide sequence is much better conserved than for the others. It is possible that a mechanism of gene conversion has maintained the homogeneity of this nucleotide sequence among the interstitial collagen genes. Alternatively, the conserved nucleotide sequence may represent a regulatory signal which could function either in the DNA or in the RNA.
Covalent structure of collagen: amino acid sequence of alpha 1(III)-CB9 from type III collagen of human liver.
Biochemistry. 1981; 20: 2621-7
Display abstract
The peptide alpha 1(III)-CB9 was prepared and purified from human liver, and its amino acid sequence was determined. Automated Edman degradation of the intact peptide and peptides derived from selective cleavage with hydroxylamine and digestions with trypsin, thermolysin, and Staph V8 protease enabled determination of the complete amino acid sequence. The peptide alpha 1(III)-CB9 represents the COOH terminus of the helical (pepsin-resistant) portion of type III collagen and terminates in a Cys-Cys sequence responsible for the intramolecular disulfide cross-linkages with other chains. The present work completes the entire amino acid sequence of the helical (pepsin-resistant) portion of human cirrhotic liver type III collagen consisting of peptides alpha 1-(III)-CB3-7-6-1-8-10-2-4-5-9. The COOH terminus of human liver alpha 1(III) contained two additional triplets which, together with the extra triplet at the NH2 terminus in alpha 1(III)-CB3, make the helical portion of type III collagen longer than alpha 1(I) by nine residues (three Gly-X-Y triplets). The helical region of human liver type III collagen, therefore, consists of 1023 amino acids or 341 triplets.
Fifty-four kilobase pairs (kbp) of cloned chicken DNA containing the entire 38-kbp pro alpha 2 (I) collagen gene have been isolated and characterized. DNA sequence analysis of a select 4 kbp of the gene has precisely described 14 exons which comprise one-third of the sequences encoding the triple-helical domain of the collagen protein. These exons range in size from 45 to 108 base pairs (bp), are all multiples of the 9 bp that code for the repeating triplet, Gly-X-Y, and have an average size of 70 bp. About 50 introns interrupt this gene. Nevertheless, introns do not separate the coding sequences for the ends of the central triple-helical structural domain and the ends of the propeptide domains.
Disease (disease genes where sequence variants are found in this domain)
SwissProt sequences and OMIM curated human diseases associated with missense mutations within the COLFI domain.
This information is based on mapping of SMART genomic protein database to KEGG orthologous groups. Percentage points are related to the number of proteins with COLFI domain which could be assigned to a KEGG orthologous group, and not all proteins containing COLFI domain. Please note that proteins can be included in multiple pathways, ie. the numbers above will not always add up to 100%.