Secondary literature sources for B5
The following references were automatically generated.
- Cerrudo CS, Mengual Gomez DL, Gomez DE, Ghiringhelli PD
- Novel insights into the evolution and structural characterization of dyskerin using comprehensive bioinformatics analysis.
- J Proteome Res. 2015; 14: 874-87
- Display abstract
Dyskerin is a conserved nucleolar protein. Several related genetic diseases are caused by defects in dyskerin. We hypothesized that having a comprehensive bioinformatic analysis of dyskerin will help to develop new drugs for this diseases. We predicted protein domains and compared sequences and structures to detect the universe of dyskerin-like proteins. We identified conserved features of shared domains in the three superkingdoms. We analyzed the phylogenetic diversity, confirming that there is a strong structural conservation. Also, we studied the relationship of dyskerin-like proteins with other proteins through an integrative protein-protein interaction approach. Most of them are conserved among homologous eukaryotic and archaeal proteins. Our results highlighted the preservation of proteins interacting with dyskerin. We identified conserved dyskerin interactor proteins between the different eukaryotes organisms. Furthermore, we studied the existence of dyskerin-like proteins in different species. Also, we compared and analyzed the secondary structure with the hydrophobic profile, confirming that all have hydrophilic properties highly conserved among proteins. The greatest difference was observed in the NTE and CTE regions. Another aspect studied was the comparison and analysis of tertiary structures. In our knowledge, this is the first time that these analyses were performed in such a comprehensive manner.
- Petitjean C, Deschamps P, Lopez-Garcia P, Moreira D, Brochier-Armanet C
- Extending the conserved phylogenetic core of archaea disentangles the evolution of the third domain of life.
- Mol Biol Evol. 2015; 32: 1242-54
- Display abstract
Initial studies of the archaeal phylogeny relied mainly on the analysis of the RNA component of the small subunit of the ribosome (SSU rRNA). The resulting phylogenies have provided interesting but partial information on the evolutionary history of the third domain of life because SSU rRNA sequences do not contain enough phylogenetic signal to resolve all nodes of the archaeal tree. Thus, many relationships, and especially the most ancient ones, remained elusive. Moreover, SSU rRNA phylogenies can be heavily biased by tree reconstruction artifacts. The sequencing of complete genomes allows using a variety of protein markers as an alternative to SSU rRNA. Taking advantage of the recent burst of archaeal complete genome sequences, we have carried out an in-depth phylogenomic analysis of this domain. We have identified 200 new protein families that, in addition to the ribosomal proteins and the subunits of the RNA polymerase, form a conserved phylogenetic core of archaeal genes. The accurate analysis of these markers combined with desaturation approaches shed new light on the evolutionary history of Archaea and reveals that several relationships recovered in recent analyses are likely the consequence of tree reconstruction artifacts. Among others, we resolve a number of important relationships, such as those among methanogens Class I, and we propose the definition of two new superclasses within the Euryarchaeota: Methanomada and Diaforarchaea.
- Kaasalainen U, Olsson S, Rikkinen J
- Evolution of the tRNALeu (UAA) Intron and Congruence of Genetic Markers in Lichen-Symbiotic Nostoc.
- PLoS One. 2015; 10: 131223-131223
- Display abstract
The group I intron interrupting the tRNALeu UAA gene (trnL) is present in most cyanobacterial genomes as well as in the plastids of many eukaryotic algae and all green plants. In lichen symbiotic Nostoc, the P6b stem-loop of trnL intron always involves one of two different repeat motifs, either Class I or Class II, both with unresolved evolutionary histories. Here we attempt to resolve the complex evolution of the two different trnL P6b region types. Our analysis indicates that the Class II repeat motif most likely appeared first and that independent and unidirectional shifts to the Class I motif have since taken place repeatedly. In addition, we compare our results with those obtained with other genetic markers and find strong evidence of recombination in the 16S rRNA gene, a marker widely used in phylogenetic studies on Bacteria. The congruence of the different genetic markers is successfully evaluated with the recently published software Saguaro, which has not previously been utilized in comparable studies.
- Petitjean C, Deschamps P, Lopez-Garcia P, Moreira D
- Rooting the domain archaea by phylogenomic analysis supports the foundation of the new kingdom Proteoarchaeota.
- Genome Biol Evol. 2015; 7: 191-204
- Display abstract
The first 16S rRNA-based phylogenies of the Archaea showed a deep division between two groups, the kingdoms Euryarchaeota and Crenarchaeota. This bipartite classification has been challenged by the recent discovery of new deeply branching lineages (e.g., Thaumarchaeota, Aigarchaeota, Nanoarchaeota, Korarchaeota, Parvarchaeota, Aenigmarchaeota, Diapherotrites, and Nanohaloarchaeota) which have also been given the same taxonomic status of kingdoms. However, the phylogenetic position of some of these lineages is controversial. In addition, phylogenetic analyses of the Archaea have often been carried out without outgroup sequences, making it difficult to determine if these taxa actually define lineages at the same level as the Euryarchaeota and Crenarchaeota. We have addressed the question of the position of the root of the Archaea by reconstructing rooted archaeal phylogenetic trees using bacterial sequences as outgroup. These trees were based on commonly used conserved protein markers (32 ribosomal proteins) as well as on 38 new markers identified through phylogenomic analysis. We thus gathered a total of 70 conserved markers that we analyzed as a concatenated data set. In contrast with previous analyses, our trees consistently placed the root of the archaeal tree between the Euryarchaeota (including the Nanoarchaeota and other fast-evolving lineages) and the rest of archaeal species, which we propose to class within the new kingdom Proteoarchaeota. This implies the relegation of several groups previously classified as kingdoms (e.g., Crenarchaeota, Thaumarchaeota, Aigarchaeota, and Korarchaeota) to a lower taxonomic rank. In addition to taxonomic implications, this profound reorganization of the archaeal phylogeny has also consequences on our appraisal of the nature of the last archaeal ancestor, which most likely was a complex organism with a gene-rich genome.
- Koonin EV, Yutin N
- The dispersed archaeal eukaryome and the complex archaeal ancestor of eukaryotes.
- Cold Spring Harb Perspect Biol. 2014; 6: 16188-16188
- Display abstract
The ancestral set of eukaryotic genes is a chimera composed of genes of archaeal and bacterial origins thanks to the endosymbiosis event that gave rise to the mitochondria and apparently antedated the last common ancestor of the extant eukaryotes. The proto-mitochondrial endosymbiont is confidently identified as an alpha-proteobacterium. In contrast, the archaeal ancestor of eukaryotes remains elusive, although evidence is accumulating that it could have belonged to a deep lineage within the TACK (Thaumarchaeota, Aigarchaeota, Crenarchaeota, Korarchaeota) superphylum of the Archaea. Recent surveys of archaeal genomes show that the apparent ancestors of several key functional systems of eukaryotes, the components of the archaeal "eukaryome," such as ubiquitin signaling, RNA interference, and actin-based and tubulin-based cytoskeleton structures, are identifiable in different archaeal groups. We suggest that the archaeal ancestor of eukaryotes was a complex form, rooted deeply within the TACK superphylum, that already possessed some quintessential eukaryotic features, in particular, a cytoskeleton, and perhaps was capable of a primitive form of phagocytosis that would facilitate the engulfment of potential symbionts. This putative group of Archaea could have existed for a relatively short time before going extinct or undergoing genome streamlining, resulting in the dispersion of the eukaryome. This scenario might explain the difficulty with the identification of the archaeal ancestor of eukaryotes despite the straightforward detection of apparent ancestors to many signature eukaryotic functional systems.
- Caetano-Anolles G, Mittenthal JE, Caetano-Anolles D, Kim KM
- A calibrated chronology of biochemistry reveals a stem line of descent responsible for planetary biodiversity.
- Front Genet. 2014; 5: 306-306
- Display abstract
Time-calibrated phylogenomic trees of protein domain structure produce powerful chronologies describing the evolution of biochemistry and life. These timetrees are built from a genomic census of millions of encoded proteins using models of nested accumulation of molecules in evolving proteomes. Here we show that a primordial stem line of descent, a propagating series of pluripotent cellular entities, populates the deeper branches of the timetrees. The stem line produced for the first time cellular grades ~2.9 billion years (Gy)-ago, which slowly turned into lineages of superkingdom Archaea. Prompted by the rise of planetary oxygen and aerobic metabolism, the stem line also produced bacterial and eukaryal lineages. Superkingdom-specific domain repertoires emerged ~2.1 Gy-ago delimiting fully diversified Bacteria. Repertoires specific to Eukarya and Archaea appeared 300 millions years later. Results reconcile reductive evolutionary processes leading to the early emergence of Archaea to superkingdom-specific innovations compatible with a tree of life rooted in Bacteria.
- Tshori S, Razin E, Nechushtan H
- Amino-acyl tRNA synthetases generate dinucleotide polyphosphates as second messengers: functional implications.
- Top Curr Chem. 2014; 344: 189-206
- Display abstract
In this chapter we describe aminoacyl-tRNA synthetase (aaRS) production of dinucleotide polyphosphate in response to stimuli, their interaction with various signaling pathways, and the role of diadenosine tetraphosphate and diadenosine triphosphate as second messengers. The primary role of aaRS is to mediate aminoacylation of cognate tRNAs, thereby providing a central role for the decoding of genetic code during protein translation. However, recent studies suggest that during evolution, "moonlighting" or non-canonical roles were acquired through incorporation of additional domains, leading to regulation by aaRSs of a spectrum of important biological processes, including cell cycle control, tissue differentiation, cellular chemotaxis, and inflammation. In addition to aminoacylation of tRNA, most aaRSs can also produce dinucleotide polyphosphates in a variety of physiological conditions. The dinucleotide polyphosphates produced by aaRS are biologically active both extra- and intra-cellularly, and seem to function as important signaling molecules. Recent findings established the role of dinucleotide polyphosphates as second messengers.
- Dunin-Horkawicz S, Kopec KO, Lupas AN
- Prokaryotic ancestry of eukaryotic protein networks mediating innate immunity and apoptosis.
- J Mol Biol. 2014; 426: 1568-82
- Display abstract
Protein domains characteristic of eukaryotic innate immunity and apoptosis have many prokaryotic counterparts of unknown function. By reconstructing interactomes computationally, we found that bacterial proteins containing these domains are part of a network that also includes other domains not hitherto associated with immunity. This network is connected to the network of prokaryotic signal transduction proteins, such as histidine kinases and chemoreceptors. The network varies considerably in domain composition and degree of paralogy, even between strains of the same species, and its repetitive domains are often amplified recently, with individual repeats sharing up to 100% sequence identity. Both phenomena are evidence of considerable evolutionary pressure and thus compatible with a role in the "arms race" between host and pathogen. In order to investigate the relationship of this network to its eukaryotic counterparts, we performed a cluster analysis of organisms based on a census of its constituent domains across all fully sequenced genomes. We obtained a large central cluster of mainly unicellular organisms, from which multicellular organisms radiate out in two main directions. One is taken by multicellular bacteria, primarily cyanobacteria and actinomycetes, and plants form an extension of this direction, connected via the basal, unicellular cyanobacteria. The second main direction is taken by animals and fungi, which form separate branches with a common root in the alpha-proteobacteria of the central cluster. This analysis supports the notion that the innate immunity networks of eukaryotes originated from their endosymbionts and that increases in the complexity of these networks accompanied the emergence of multicellularity.
- Luque I, Ochoa de Alda JA
- CURT1,CAAD-containing aaRSs, thylakoid curvature and gene translation.
- Trends Plant Sci. 2014; 19: 63-6
- Display abstract
CURT1 proteins induce membrane curvature to grana margins in Arabidopsis (Arabidopsis thaliana) thylakoids. A domain sharing sequence and structural features with CURT1 is found in some cyanobacterial aminoacyl-tRNA synthetases (aaRSs) that show an unusual localization to the thylakoid membranes. Evolutionary scenarios and functional implications are discussed in this article.
- Imanian B, Keeling PJ
- Horizontal gene transfer and redundancy of tryptophan biosynthetic enzymes in dinotoms.
- Genome Biol Evol. 2014; 6: 333-43
- Display abstract
A tertiary endosymbiosis between a dinoflagellate host and diatom endosymbiont gave rise to "dinotoms," cells with a unique nuclear and mitochondrial redundancy derived from two evolutionarily distinct eukaryotic lineages. To examine how this unique redundancy might have affected the evolution of metabolic systems, we investigated the transcription of genes involved in biosynthesis of the amino acid tryptophan in three species, Durinskia baltica, Kryptoperidinium foliaceum, and Glenodinium foliaceum. From transcriptome sequence data, we recovered two distinct sets of protein-coding transcripts covering the entire tryptophan biosynthetic pathway. Phylogenetic analyses suggest a diatom origin for one set of the proteins, which we infer to be expressed in the endosymbiont, and that the other arose from multiple horizontal gene transfer events to the dinoflagellate ancestor of the host lineage. This is the first indication that these cells retain redundant sets of transcripts and likely metabolic pathways for the biosynthesis of small molecules and extend their redundancy to their two distinct nuclear genomes.
- Droge J, Buczek D, Suzuki Y, Makalowski W
- Amoebozoa possess lineage-specific globin gene repertoires gained by individual horizontal gene transfers.
- Int J Biol Sci. 2014; 10: 689-701
- Display abstract
The Amoebozoa represent a clade of unicellular amoeboid organisms that display a wide variety of lifestyles, including free-living and parasitic species. For example, the social amoeba Dictyostelium discoideum has the ability to aggregate into a multicellular fruiting body upon starvation, while the pathogenic amoeba Entamoeba histolytica is a parasite of humans. Globins are small heme proteins that are present in almost all extant organisms. Although several genomes of amoebozoan species have been sequenced, little is known about the phyletic distribution of globin genes within this phylum. Only two flavohemoglobins (FHbs) of D. discoideum have been reported and characterized previously while the genomes of Entamoeba species are apparently devoid of globin genes. We investigated eleven amoebozoan species for the presence of globin genes by genomic and phylogenetic in silico analyses. Additional FHb genes were identified in the genomes of four social amoebas and the true slime mold Physarum polycephalum. Moreover, a single-domain globin (SDFgb) of Hartmannella vermiformis, as well as two truncated hemoglobins (trHbs) of Acanthamoeba castellanii were identified. Phylogenetic evidence suggests that these globin genes were independently acquired via horizontal gene transfer from some ancestral bacteria. Furthermore, the phylogenetic tree of amoebozoan FHbs indicates that they do not share a common ancestry and that a transfer of FHbs from bacteria to amoeba occurred multiple times.
- Guo M, Yang XL
- Architecture and metamorphosis.
- Top Curr Chem. 2014; 344: 89-118
- Display abstract
When compared to other conserved housekeeping protein families, such as ribosomal proteins, during the evolution of higher eukaryotes, aminoacyl-tRNA synthetases (aaRSs) show an apparent high propensity to add new sequences, and especially new domains. The stepwise emergence of those new domains is consistent with their involvement in a broad range of biological functions beyond protein synthesis, and correlates with the increasing biological complexity of higher organisms. These new domains have been extensively characterized based on their evolutionary origins and their sequence, structural, and functional features. While some of the domains are uniquely found in aaRSs and may have originated from nucleic acid binding motifs, others are common domain modules mediating protein-protein interactions that play a critical role in the assembly of the multi-synthetase complex (MSC). Interestingly, the MSC has emerged from a miniature complex in yeast to a large stable complex in humans. The human MSC consists of nine aaRSs (LysRS, ArgRS, GlnRS, AspRS, MetRS, IleRS, LeuRS, GluProRS, and bifunctional aaRs) and three scaffold proteins (AIMP1/p43, AIMP2/p38, and AIMP3/p18), and has a molecular weight of 1.5 million Dalton. The MSC has been proposed to have a functional dualism: facilitating protein synthesis and serving as a reservoir of non-canonical functions associated with its synthetase and non-synthetase components. Importantly, domain additions and functional expansions are not limited to the components of the MSC and are found in almost all aaRS proteins. From a structural perspective, multi-functionalities are represented by multiple conformational states. In fact, alternative conformations of aaRSs have been generated by various mechanisms from proteolysis to alternative splicing and posttranslational modifications, as well as by disease-causing mutations. Therefore, the metamorphosis between different conformational states is connected to the activation and regulation of the novel functions of aaRSs in higher eukaryotes.
- Kaushik S, Sowdhamini R
- Distribution, classification, domain architectures and evolution of prolyl oligopeptidases in prokaryotic lineages.
- BMC Genomics. 2014; 15: 985-985
- Display abstract
BACKGROUND: Prolyl oligopeptidases (POPs) are proteolytic enzymes, widely distributed in all the kingdoms of life. Bacterial POPs are pharmaceutically important enzymes, yet their functional and evolutionary details are not fully explored. Therefore, current analysis is aimed at understanding the distribution, domain architecture, probable biological functions and gene family expansion of POPs in bacterial and archaeal lineages. RESULTS: Exhaustive sequence analysis of 1,202 bacterial and 91 archaeal genomes revealed ~3,000 POP homologs, with only 638 annotated POPs. We observed wide distribution of POPs in all the analysed bacterial lineages. Phylogenetic analysis and co-clustering of POPs of different phyla suggested their common functions in all the prokaryotic species. Further, on the basis of unique sequence motifs we could classify bacterial POPs into eight subtypes. Analysis of coexisting domains in POPs highlighted their involvement in protein-protein interactions and cellular signaling. We proposed significant extension of this gene family by characterizing 39 new POPs and 158 new alpha/beta hydrolase members. CONCLUSIONS: Our study reflects diversity and functional importance of POPs in bacterial species. Many genomes with multiple POPs were identified with high sequence variations and different cellular localizations. Such anomalous distribution of POP genes in different bacterial genomes shows differential expansion of POP gene family primarily by multiple horizontal gene transfer events.
- Karamichali I, Koumandou VL, Karagouni AD, Kossida S
- Frequent gene fissions associated with human pathogenic bacteria.
- Genomics. 2014; 103: 65-75
- Display abstract
Gene fusion and fission events are important for evolutionary studies and for predicting protein-protein interactions. Previous studies have shown that fusion events always predominate over fission events and, in their majority, they represent singular events throughout evolution. In this project, the role of fusion and fission events in the genome evolution of 104 human bacterial pathogens was studied. 141 protein pairs were identified to be involved in gene fusion or fission events. Surprisingly, we find that, in the species analyzed, gene fissions prevail over fusions. Moreover, while most events appear to have occurred only once in evolution, 23% of the gene fusion and fission events identified are deduced to have occurred independently multiple times. Comparison of the analyzed bacteria with non-pathogenic close relatives indicates that this impressive result is associated with the recent evolutionary history of the human bacterial pathogens, and thus is probably caused by their pathogenic lifestyle.
- Godinic-Mikulcic V et al.
- Archaeal aminoacyl-tRNA synthetases interact with the ribosome to recycle tRNAs.
- Nucleic Acids Res. 2014; 42: 5191-201
- Display abstract
Aminoacyl-tRNA synthetases (aaRS) are essential enzymes catalyzing the formation of aminoacyl-tRNAs, the immediate precursors for encoded peptides in ribosomal protein synthesis. Previous studies have suggested a link between tRNA aminoacylation and high-molecular-weight cellular complexes such as the cytoskeleton or ribosomes. However, the structural basis of these interactions and potential mechanistic implications are not well understood. To biochemically characterize these interactions we have used a system of two interacting archaeal aaRSs: an atypical methanogenic-type seryl-tRNA synthetase and an archaeal ArgRS. More specifically, we have shown by thermophoresis and surface plasmon resonance that these two aaRSs bind to the large ribosomal subunit with micromolar affinities. We have identified the L7/L12 stalk and the proteins located near the stalk base as the main sites for aaRS binding. Finally, we have performed a bioinformatics analysis of synonymous codons in the Methanothermobacter thermautotrophicus genome that supports a mechanism in which the deacylated tRNAs may be recharged by aaRSs bound to the ribosome and reused at the next occurrence of a codon encoding the same amino acid. These results suggest a mechanism of tRNA recycling in which aaRSs associate with the L7/L12 stalk region to recapture the tRNAs released from the preceding ribosome in polysomes.
- Kodavali PK, Dudkiewicz M, Pikula S, Pawlowski K
- Bioinformatics analysis of bacterial annexins--putative ancestral relatives of eukaryotic annexins.
- PLoS One. 2014; 9: 85428-85428
- Display abstract
Annexins are Ca(2+)-binding, membrane-interacting proteins, widespread among eukaryotes, consisting usually of four structurally similar repeated domains. It is accepted that vertebrate annexins derive from a double genome duplication event. It has been postulated that a single domain annexin, if found, might represent a molecule related to the hypothetical ancestral annexin. The recent discovery of a single-domain annexin in a bacterium, Cytophaga hutchinsonii, apparently confirmed this hypothesis. Here, we present a more complex picture. Using remote sequence similarity detection tools, a survey of bacterial genomes was performed in search of annexin-like proteins. In total, we identified about thirty annexin homologues, including single-domain and multi-domain annexins, in seventeen bacterial species. The thorough search yielded, besides the known annexin homologue from C. hutchinsonii, homologues from the Bacteroidetes/Chlorobi phylum, from Gemmatimonadetes, from beta- and delta-Proteobacteria, and from Actinobacteria. The sequences of bacterial annexins exhibited remote but statistically significant similarity to sequence profiles built of the eukaryotic ones. Some bacterial annexins are equipped with additional, different domains, for example those characteristic for toxins. The variation in bacterial annexin sequences, much wider than that observed in eukaryotes, and different domain architectures suggest that annexins found in bacteria may actually descend from an ancestral bacterial annexin, from which eukaryotic annexins also originate. The hypothesis of an ancient origin of bacterial annexins has to be reconciled with the fact that remarkably few bacterial strains possess annexin genes compared to the thousands of known bacterial genomes and with the patchy, anomalous phylogenetic distribution of bacterial annexins. Thus, a massive annexin gene loss in several bacterial lineages or very divergent evolution would appear a likely explanation. Alternative evolutionary scenarios, involving horizontal gene transfer between bacteria and protozoan eukaryotes, in either direction, appear much less likely. Altogether, current evidence does not allow unequivocal judgement as to the origin of bacterial annexins.
- Yin LF et al.
- Evolutionary analysis revealed the horizontal transfer of the Cyt b gene from Fungi to Chromista.
- Mol Phylogenet Evol. 2014; 76: 155-61
- Display abstract
In this study, the cytochrome b (Cyt b) amino acid sequences were analyzed in 50 organisms covering all 5 kingdoms of eukaryotes. Six conserved domains, i.e., heme bL binding sites, heme bH binding sites, Qo binding sites, Qi binding sites, the interchain domain interface, and the intrachain domain interface were found in all investigated sequences. The topology of the phylogenetic trees was largely consistent with the well recognized taxonomic relationships, indicating that the Cyt b genes originated from a common ancestral gene before the divergence of eukaryotic kingdoms. The eukaryotic Cyt b genes likely originated from an ancient prokaryotic gene in Alphaproteobacteria based on shared conserved domains. We provide evidence that the Cyt b gene of oomycete Pseudoperonospora cubensis was horizontally transferred from a fungus in the order Hypocreales. To our knowledge, this is the first reported evidence of Horizontal gene transfer (HGT) from Fungi to Chromista involving an essential house-keeping gene. Our data suggest that HGT events must be considered when evolutionary trees are constructed only based on Cyt b genes. Additional analysis of thousands of Cyt b sequences from Genbank revealed that introns in mitochondrial Cyt b genes were acquired after the endosymbiosis of alphaproteobacteria in eukaryotic cells.
- Dasgupta S, Basu G
- Evolutionary insights about bacterial GlxRS from whole genome analyses: is GluRS2 a chimera?
- BMC Evol Biol. 2014; 14: 26-26
- Display abstract
BACKGROUND: Evolutionary histories of glutamyl-tRNA synthetase (GluRS) and glutaminyl-tRNA synthetase (GlnRS) in bacteria are convoluted. After the divergence of eubacteria and eukarya, bacterial GluRS glutamylated both tRNAGln and tRNAGlu until GlnRS appeared by horizontal gene transfer (HGT) from eukaryotes or a duplicate copy of GluRS (GluRS2) that only glutamylates tRNAGln appeared. The current understanding is based on limited sequence data and not always compatible with available experimental results. In particular, the origin of GluRS2 is poorly understood. RESULTS: A large database of bacterial GluRS, GlnRS, tRNAGln and the trimeric aminoacyl-tRNA-dependent amidotransferase (gatCAB), constructed from whole genomes by functionally annotating and classifying these enzymes according to their mutual presence and absence in the genome, was analyzed. Phylogenetic analyses showed that the catalytic and the anticodon-binding domains of functional GluRS2 (as in Helicobacter pylori) were independently acquired from evolutionarily distant hosts by HGT. Non-functional GluRS2 (as in Thermotoga maritima), on the other hand, was found to contain an anticodon-binding domain appended to a gene-duplicated catalytic domain. Several genomes were found to possess both GluRS2 and GlnRS, even though they share the common function of aminoacylating tRNAGln. GlnRS was widely distributed among bacterial phyla and although phylogenetic analyses confirmed the origin of most bacterial GlnRS to be through a single HGT from eukarya, many GlnRS sequences also appeared with evolutionarily distant phyla in phylogenetic tree. A GlnRS pseudogene could be identified in Sorangium cellulosum. CONCLUSIONS: Our analysis broadens the current understanding of bacterial GlxRS evolution and highlights the idiosyncratic evolution of GluRS2. Specifically we show that: i) GluRS2 is a chimera of mismatching catalytic and anticodon-binding domains, ii) the appearance of GlnRS and GluRS2 in a single bacterial genome indicating that the evolutionary histories of the two enzymes are distinct, iii) GlnRS is more widespread in bacteria than is believed, iv) bacterial GlnRS appeared both by HGT from eukarya and intra-bacterial HGT, v) presence of GlnRS pseudogene shows that many bacteria could not retain the newly acquired eukaryal GlnRS. The functional annotation of GluRS, without recourse to experiments, performed in this work, demonstrates the inherent and unique advantages of using whole genome over isolated sequence databases.
- Taniguchi T et al.
- Decoding system for the AUA codon by tRNAIle with the UAU anticodon in Mycoplasma mobile.
- Nucleic Acids Res. 2013; 41: 2621-31
- Display abstract
Deciphering the genetic code is a fundamental process in all living organisms. In many bacteria, AUA codons are deciphered by tRNA(Ile2) bearing lysidine (L) at the wobble position. L is a modified cytidine introduced post-transcriptionally by tRNA(Ile)-lysidine synthetase (TilS). Some bacteria, including Mycoplasma mobile, do not carry the tilS gene, indicating that they have established a different system to decode AUA codons. In this study, tRNA(Ile2) has been isolated from M. mobile and was found to contain a UAU anticodon without any modification. Mycoplasma mobile isoleucyl-tRNA synthetase (IleRS) recognized the UAU anticodon, whereas Escherichia coli IleRS did not efficiently aminoacylate tRNA(Ile2)(UAU). In M. mobile IleRS, a single Arg residue at position 865 was critical for specificity for the UAU anticodon and, when the corresponding site (W905) in E. coli IleRS was substituted with Arg, the W905R mutant efficiently aminoacylated tRNA with UAU anticodon. Mycoplasma mobile tRNA(Ile2) cannot distinguish between AUA and AUG codon on E. coli ribosome. However, on M. mobile ribosome, M. mobile tRNA(Ile2)(UAU) specifically recognized AUA codon, and not AUG codon, suggesting M. mobile ribosome has a property that prevents misreading of AUG codon. These findings provide an insight into the evolutionary reorganization of the AUA decoding system.
- Chandrasekaran SN, Yardimci GG, Erdogan O, Roach J, Carter CW Jr
- Statistical evaluation of the Rodin-Ohno hypothesis: sense/antisense coding of ancestral class I and II aminoacyl-tRNA synthetases.
- Mol Biol Evol. 2013; 30: 1588-604
- Display abstract
We tested the idea that ancestral class I and II aminoacyl-tRNA synthetases arose on opposite strands of the same gene. We assembled excerpted 94-residue Urgenes for class I tryptophanyl-tRNA synthetase (TrpRS) and class II Histidyl-tRNA synthetase (HisRS) from a diverse group of species, by identifying and catenating three blocks coding for secondary structures that position the most highly conserved, active-site residues. The codon middle-base pairing frequency was 0.35 +/- 0.0002 in all-by-all sense/antisense alignments for 211 TrpRS and 207 HisRS sequences, compared with frequencies between 0.22 +/- 0.0009 and 0.27 +/- 0.0005 for eight different representations of the null hypothesis. Clustering algorithms demonstrate further that profiles of middle-base pairing in the synthetase antisense alignments are correlated along the sequences from one species-pair to another, whereas this is not the case for similar operations on sets representing the null hypothesis. Most probable reconstructed sequences for ancestral nodes of maximum likelihood trees show that middle-base pairing frequency increases to approximately 0.42 +/- 0.002 as bacterial trees approach their roots; ancestral nodes from trees including archaeal sequences show a less pronounced increase. Thus, contemporary and reconstructed sequences all validate important bioinformatic predictions based on descent from opposite strands of the same ancestral gene. They further provide novel evidence for the hypothesis that bacteria lie closer than archaea to the origin of translation. Moreover, the inverse polarity of genetic coding, together with a priori alpha-helix propensities suggest that in-frame coding on opposite strands leads to similar secondary structures with opposite polarity, as observed in TrpRS and HisRS crystal structures.
- Song D, Cho WK, Park SH, Jo Y, Kim KH
- Evolution of and horizontal gene transfer in the Endornavirus genus.
- PLoS One. 2013; 8: 64270-64270
- Display abstract
The transfer of genetic information between unrelated species is referred to as horizontal gene transfer. Previous studies have demonstrated that both retroviral and non-retroviral sequences have been integrated into eukaryotic genomes. Recently, we identified many non-retroviral sequences in plant genomes. In this study, we investigated the evolutionary origin and gene transfer of domains present in endornaviruses which are double-stranded RNA viruses. Using the available sequences for endornaviruses, we found that Bell pepper endornavirus-like sequences homologous to the glycosyltransferase 28 domain are present in plants, fungi, and bacteria. The phylogenetic analysis revealed the glycosyltransferase 28 domain of Bell pepper endornavirus may have originated from bacteria. In addition, two domains of Oryza sativa endornavirus, a glycosyltransferase sugar-binding domain and a capsular polysaccharide synthesis protein, also exhibited high similarity to those of bacteria. We found evidence that at least four independent horizontal gene transfer events for the glycosyltransferase 28 domain have occurred among plants, fungi, and bacteria. The glycosyltransferase sugar-binding domains of two proteobacteria may have been horizontally transferred to the genome of Thalassiosira pseudonana. Our study is the first to show that three glycome-related viral genes in the genus Endornavirus have been acquired from marine bacteria by horizontal gene transfer.
- Harish A, Tunlid A, Kurland CG
- Rooted phylogeny of the three superkingdoms.
- Biochimie. 2013; 95: 1593-604
- Display abstract
The traditional bacterial rooting of the three superkingdoms in sequence-based gene trees is inconsistent with new phylogenetic reconstructions based on genome content of compact protein domains. We find that protein domains at the level of the SCOP superfamily (SF) from sequenced genomes implement with maximum parsimony fully resolved rooted trees. Such genome content trees identify archaea and bacteria (akaryotes) as sister clades that diverge from an akaryote common ancestor, LACA. Several eukaryote sister clades diverge from a eukaryote common ancestor, LECA. LACA and LECA descend in parallel from the most recent universal common ancestor (MRUCA), which is not a bacterium. Rather, MRUCA presents 75% of the unique SFs encoded by extant genomes of the three superkingdoms, each encoding a proteome that partially overlaps all others. This alone implies that the common ancestor to the superkingdoms was very complex. Such ancestral complexity is confirmed by phylogenetic reconstructions. In addition, the divergence of proteomes from the complex ancestor in each superkingdom is both reductive in numbers of unique SFs as well as cumulative in the abundance of surviving SFs. These data suggest that the common ancestor was not the first cell lineage and that modern global phylogeny is the crown of a "recently" re-rooted tree. We suggest that a bottlenecked survivor of an environmental collapse, which preceded the flourishing of the modern crown, seeded the current phylogenetic tree.
- Zmasek CM, Godzik A
- Evolution of the animal apoptosis network.
- Cold Spring Harb Perspect Biol. 2013; 5: 8649-8649
- Display abstract
The number of available eukaryotic genomes has expanded to the point where we can evaluate the complete evolutionary history of many cellular processes. Such analyses for the apoptosis regulatory networks suggest that this network already existed in the ancestor of the entire animal kingdom (Metazoa) in a form more complex than in some popular animal model organisms. This supports the growing realization that regulatory networks do not necessarily evolve from simple to complex and that the relative simplicity of these networks in nematodes and insects does not represent an ancestral state, but is the result of secondary simplifications. Network evolution is not a process of monotonous increase in complexity, but a dynamic process that includes lineage-specific gene losses and expansions, protein domain reshuffling, and emergence/reemergence of similar protein architectures by parallel evolution. Studying the evolution of such networks is a challenging yet interesting subject for research and investigation, and such studies on the apoptosis networks provide us with interesting hints of how these networks, critical in so many human diseases, have developed.
- Grau-Bove X, Sebe-Pedros A, Ruiz-Trillo I
- A genomic survey of HECT ubiquitin ligases in eukaryotes reveals independent expansions of the HECT system in several lineages.
- Genome Biol Evol. 2013; 5: 833-47
- Display abstract
The posttranslational modification of proteins by the ubiquitination pathway is an important regulatory mechanism in eukaryotes. To date, however, studies on the evolutionary history of the proteins involved in this pathway have been restricted to E1 and E2 enzymes, whereas E3 studies have been focused mainly in metazoans and plants. To have a wider perspective, here we perform a genomic survey of the HECT family of E3 ubiquitin-protein ligases, an important part of this posttranslational pathway, in genomes from representatives of all major eukaryotic lineages. We classify eukaryotic HECTs and reconstruct, by phylogenetic analysis, the putative repertoire of these proteins in the last eukaryotic common ancestor (LECA). Furthermore, we analyze the diversity and complexity of protein domain architectures of HECTs along the different extant eukaryotic lineages. Our data show that LECA had six different HECTs and that protein expansion and N-terminal domain diversification shaped HECT evolution. Our data reveal that the genomes of animals and unicellular holozoans considerably increased the molecular and functional diversity of their HECT system compared with other eukaryotes. Other eukaryotes, such as the Apusozoa Thecanomas trahens or the Heterokonta Phytophthora infestans, independently expanded their HECT repertoire. In contrast, plant, excavate, rhodophyte, chlorophyte, and fungal genomes have a more limited enzymatic repertoire. Our genomic survey and phylogenetic analysis clarifies the origin and evolution of different HECT families among eukaryotes and provides a useful phylogenetic framework for future evolutionary studies of this regulatory pathway.
- Lynch M
- Evolutionary diversification of the multimeric states of proteins.
- Proc Natl Acad Sci U S A. 2013; 110: 28218-28218
- Display abstract
One of the most striking features of proteins is their common assembly into multimeric structures, usually homomers with even numbers of subunits all derived from the same genetic locus. However, although substantial structural variation for orthologous proteins exists within and among major phylogenetic lineages, in striking contrast to patterns of gene structure and genome organization, there appears to be no correlation between the level of protein structural complexity and organismal complexity. In addition, there is no evidence that protein architectural differences are driven by lineage-specific differences in selective pressures. Here, it is suggested that variation in the multimeric states of proteins can readily arise from stochastic transitions resulting from the joint processes of mutation and random genetic drift, even in the face of constant directional selection for one particular protein architecture across all lineages. Under the proposed hypothesis, on a long evolutionary timescale, the numbers of transitions from monomers to dimers should approximate the numbers in the opposite direction and similarly for transitions between higher-order structures.
- Swithers KS, Soucy SM, Lasek-Nesselquist E, Lapierre P, Gogarten JP
- Distribution and evolution of the mobile vma-1b intein.
- Mol Biol Evol. 2013; 30: 2676-87
- Display abstract
Inteins are self-splicing parasitic genetic elements found in all domains of life. These genetic elements are found in highly conserved positions in conserved proteins. One protein family that has been invaded by inteins is the vacuolar and archaeal catalytic ATPase subunits (vma-1). There are two intein insertion sites in this protein, "a" and "b." The b site was previously thought to be only invaded in archaeal lineages. Here we survey the distribution and evolutionary histories of the b site inteins and show that the intein is present in more lineages than previously annotated, including a bacterial lineage, Mahella australiensis 50-1 BON. We present evidence, through ancestral character state reconstruction and substitution ratios between host genes and inteins, for several transfers of this intein between divergent species, including an interdomain transfer between the archaea and bacteria. Although inteins may persist within a single population or species for long periods of time, transfer of the vma-1b intein between divergent species contributed to the distribution of this intein.
- Alvarez-Ponce D, Lopez P, Bapteste E, McInerney JO
- Gene similarity networks provide tools for understanding eukaryote origins and evolution.
- Proc Natl Acad Sci U S A. 2013; 110: 1594603-1594603
- Display abstract
The complexity and depth of the relationships between the three domains of life challenge the reliability of phylogenetic methods, encouraging the use of alternative analytical tools. We reconstructed a gene similarity network comprising the proteomes of 14 eukaryotes, 104 prokaryotes, 2,389 viruses and 1,044 plasmids. This network contains multiple signatures of the chimerical origin of Eukaryotes as a fusion of an archaebacterium and a eubacterium that could not have been observed using phylogenetic trees. A number of connected components (gene sets with stronger similarities than expected by chance) contain pairs of eukaryotic sequences exhibiting no direct detectable similarity. Instead, many eukaryotic sequences were indirectly connected through a "eukaryote-archaebacterium-eubacterium-eukaryote" similarity path. Furthermore, eukaryotic genes highly connected to prokaryotic genes from one domain tend not to be connected to genes from the other prokaryotic domain. Genes of archaebacterial and eubacterial ancestry tend to perform different functions and to act at different subcellular compartments, but in such an intertwined way that suggests an early rather than late integration of both gene repertoires. The archaebacterial repertoire has a similar size in all eukaryotic genomes whereas the number of eubacterium-derived genes is much more variable, suggesting a higher plasticity of this gene repertoire. Consequently, highly reduced eukaryotic genomes contain more genes of archaebacterial than eubacterial affinity. Connected components with prokaryotic and eukaryotic genes tend to include viral and plasmid genes, compatible with a role of gene mobility in the origin of Eukaryotes. Our analyses highlight the power of network approaches to study deep evolutionary events.
- Sand A, Steel M
- The standard lateral gene transfer model is statistically consistent for pectinate four-taxon trees.
- J Theor Biol. 2013; 335: 295-8
- Display abstract
Evolutionary events such as incomplete lineage sorting and lateral gene transfers constitute major problems for inferring species trees from gene trees, as they can sometimes lead to gene trees which conflict with the underlying species tree. One particularly simple and efficient way to infer species trees from gene trees under such conditions is to combine three-taxon analyses for several genes using a majority vote approach. For incomplete lineage sorting this method is known to be statistically consistent; however, for lateral gene transfers it was recently shown that a zone of inconsistency exists for a specific four-taxon tree topology, and it was posed as an open question whether inconsistencies could exist for other four-taxon tree topologies? In this letter we analyze all remaining four-taxon topologies and show that no other inconsistencies exist.
- Wu YC, Rasmussen MD, Kellis M
- Evolution at the subgene level: domain rearrangements in the Drosophila phylogeny.
- Mol Biol Evol. 2012; 29: 689-705
- Display abstract
Although the possibility of gene evolution by domain rearrangements has long been appreciated, current methods for reconstructing and systematically analyzing gene family evolution are limited to events such as duplication, loss, and sometimes, horizontal transfer. However, within the Drosophila clade, we find domain rearrangements occur in 35.9% of gene families, and thus, any comprehensive study of gene evolution in these species will need to account for such events. Here, we present a new computational model and algorithm for reconstructing gene evolution at the domain level. We develop a method for detecting homologous domains between genes and present a phylogenetic algorithm for reconstructing maximum parsimony evolutionary histories that include domain generation, duplication, loss, merge (fusion), and split (fission) events. Using this method, we find that genes involved in fusion and fission are enriched in signaling and development, suggesting that domain rearrangements and reuse may be crucial in these processes. We also find that fusion is more abundant than fission, and that fusion and fission events occur predominantly alongside duplication, with 92.5% and 34.3% of fusion and fission events retaining ancestral architectures in the duplicated copies. We provide a catalog of approximately 9,000 genes that undergo domain rearrangement across nine sequenced species, along with possible mechanisms for their formation. These results dramatically expand on evolution at the subgene level and offer several insights into how new genes and functions arise between species.
- Suen S, Lu HH, Yeang CH
- Evolution of domain architectures and catalytic functions of enzymes in metabolic systems.
- Genome Biol Evol. 2012; 4: 976-93
- Display abstract
Domain architectures and catalytic functions of enzymes constitute the centerpieces of a metabolic network. These types of information are formulated as a two-layered network consisting of domains, proteins, and reactions-a domain-protein-reaction (DPR) network. We propose an algorithm to reconstruct the evolutionary history of DPR networks across multiple species and categorize the mechanisms of metabolic systems evolution in terms of network changes. The reconstructed history reveals distinct patterns of evolutionary mechanisms between prokaryotic and eukaryotic networks. Although the evolutionary mechanisms in early ancestors of prokaryotes and eukaryotes are quite similar, more novel and duplicated domain compositions with identical catalytic functions arise along the eukaryotic lineage. In contrast, prokaryotic enzymes become more versatile by catalyzing multiple reactions with similar chemical operations. Moreover, different metabolic pathways are enriched with distinct network evolution mechanisms. For instance, although the pathways of steroid biosynthesis, protein kinases, and glycosaminoglycan biosynthesis all constitute prominent features of animal-specific physiology, their evolution of domain architectures and catalytic functions follows distinct patterns. Steroid biosynthesis is enriched with reaction creations but retains a relatively conserved repertoire of domain compositions and proteins. Protein kinases retain conserved reactions but possess many novel domains and proteins. In contrast, glycosaminoglycan biosynthesis has high rates of reaction/protein creations and domain recruitments. Finally, we elicit and validate two general principles underlying the evolution of DPR networks: 1) duplicated enzyme proteins possess similar catalytic functions and 2) the majority of novel domains arise to catalyze novel reactions. These results shed new lights on the evolution of metabolic systems.
- Fahey B, Degnan BM
- Origin and evolution of laminin gene family diversity.
- Mol Biol Evol. 2012; 29: 1823-36
- Display abstract
Laminins are a family of multidomain glycoproteins that are important contributors to the structure of metazoan extracellular matrices. To investigate the origin and evolution of the laminin family, we characterized the full complement of laminin-related genes in the genome of the sponge, Amphimedon queenslandica. As a representative of the Demospongiae, a group consistently placed within the earliest diverging branch of animals by molecular phylogenies, Amphimedon is uniquely placed to provide insight into early steps in the evolution of metazoan gene families. Five Amphimedon laminin-related genes possess the conserved molecular features, and most of the domains found in bilaterian laminins, but all display domain architectures distinct from those of the canonical laminin chain types known from model bilaterians. This finding prompted us to perform a comparative genomic analysis of laminins and related genes from a choanoflagellate and diverse metazoans and to conduct phylogenetic analyses using the conserved Laminin N-terminal domain in order to explore the relationships between genes with distinct architectures. Laminin-like genes appear to have originated in the holozoan lineage (choanoflagellates + metazoans + several other unicellular opisthokont taxa), with several laminin domains originating later and appearing only in metazoan (animal) or eumetazoan (placozoans + ctenophores + cnidarians + bilaterians) laminins. Typical bilaterian alpha, beta, and gamma laminin chain forms arose in the eumetazoan stem and another chain type that is conserved in Amphimedon, the cnidarian, Nematostella vectensis, and the echinoderm, Strongylocentrotus purpuratus, appears to have been lost independently from the placozoan, Trichoplax adhaerens, and from multiple bilaterians. Phylogenetic analysis did not clearly reconstruct relationships between the distinct laminin chain types (with the exception of the alpha chains) but did reveal how several members of the netrin family were generated independently from within the laminin family by duplication and domain shuffling and by domain loss. Together, our results suggest that gene duplication and loss and domain shuffling and loss all played a role in the evolution of the laminin family and contributed to the generation of lineage-specific diversity in the laminin gene complements of extant metazoans.
- Zamocky M, Gasselhuber B, Furtmuller PG, Obinger C
- Molecular evolution of hydrogen peroxide degrading enzymes.
- Arch Biochem Biophys. 2012; 525: 131-44
- Display abstract
For efficient removal of intra- and/or extracellular hydrogen peroxide by dismutation to harmless dioxygen and water (2H(2)O(2) --> O(2) + 2H(2)O), nature designed three metalloenzyme families that differ in oligomeric organization, monomer architecture as well as active site geometry and catalytic residues. Here we report on the updated reconstruction of the molecular phylogeny of these three gene families. Ubiquitous typical (monofunctional) heme catalases are found in all domains of life showing a high structural conservation. Their evolution was directed from large subunit towards small subunit proteins and further to fused proteins where the catalase fold was retained but lost its original functionality. Bifunctional catalase-peroxidases were at the origin of one of the two main heme peroxidase superfamilies (i.e. peroxidase-catalase superfamily) and constitute a protein family predominantly present among eubacteria and archaea, but two evolutionary branches are also found in the eukaryotic world. Non-heme manganese catalases are a relatively small protein family with very old roots only present among bacteria and archaea. Phylogenetic analyses of the three protein families reveal features typical (i) for the evolution of whole genomes as well as (ii) for specific evolutionary events including horizontal gene transfer, paralog formation and gene fusion. As catalases have reached a striking diversity among prokaryotic and eukaryotic pathogens, understanding their phylogenetic and molecular relationship and function will contribute to drug design for prevention of diseases of humans, animals and plants.
- Gadakh B, Van Aerschot A
- Aminoacyl-tRNA synthetase inhibitors as antimicrobial agents: a patent review from 2006 till present.
- Expert Opin Ther Pat. 2012; 22: 1453-65
- Display abstract
INTRODUCTION: Aminoacyl-tRNA synthetases (aaRSs) are one of the leading targets for development of antimicrobial agents. Although these enzymes are well conserved among prokaryotes, significant divergence has occurred between prokaryotic and eukaryotic aaRSs, which can be exploited in the discovery of broad-spectrum antibacterial agents. Although several aaRS inhibitors have been reported before, they failed as a result of poor selectivity and limited cell penetration. AREAS COVERED: This review covers January 2006 to April 2012 wherein several new analogues were claimed as aaRS inhibitors. Anacor Pharmaceuticals patented several boron-containing derivatives inhibiting the function of the editing domain of aaRSs. Two patents describe the combination of aaRS inhibitors with other antibacterial agents. Patents disclosing aaRS inhibitors for indications other than antimicrobial agents are not considered for review here. EXPERT OPINION: Several recently disclosed leads may form the foundation for development of potent and selective bacterial aaRS inhibitors. In comparison with, for example, terbinafine and itraconazole, compound C10 (AN2690) is a very promising candidate for treatment of ungual and periungual infections with improved nail penetration and low keratin binding. In addition, Raplidyne, Inc. reported bicyclic heteroaromatic compounds as potent and selective inhibitors of bacterial MetRS. These have proven to be particularly effective for treatment of Clostridium difficile-associated diarrhea. Finally, combination of aaRS inhibitors to attenuate resistance looks as a viable strategy to expand the lifespan of existing antibiotics.
- Chang CP, Tseng YK, Ko CY, Wang CC
- Alanyl-tRNA synthetase genes of Vanderwaltozyma polyspora arose from duplication of a dual-functional predecessor of mitochondrial origin.
- Nucleic Acids Res. 2012; 40: 314-22
- Display abstract
In eukaryotes, the cytoplasmic and mitochondrial forms of a given aminoacyl-tRNA synthetase (aaRS) are typically encoded by two orthologous nuclear genes, one of eukaryotic origin and the other of mitochondrial origin. We herein report a novel scenario of aaRS evolution in yeast. While all other yeast species studied possess a single nuclear gene encoding both forms of alanyl-tRNA synthetase (AlaRS), Vanderwaltozyma polyspora, a yeast species descended from the same whole-genome duplication event as Saccharomyces cerevisiae, contains two distinct nuclear AlaRS genes, one specifying the cytoplasmic form and the other its mitochondrial counterpart. The protein sequences of these two isoforms are very similar to each other. The isoforms are actively expressed in vivo and are exclusively localized in their respective cellular compartments. Despite the presence of a promising AUG initiator candidate, the gene encoding the mitochondrial form is actually initiated from upstream non-AUG codons. A phylogenetic analysis further revealed that all yeast AlaRS genes, including those in V. polyspora, are of mitochondrial origin. These findings underscore the possibility that contemporary AlaRS genes in V. polyspora arose relatively recently from duplication of a dual-functional predecessor of mitochondrial origin.
- Pylro VS, Vespoli Lde S, Duarte GF, Yotoko KS
- Detection of horizontal gene transfers from phylogenetic comparisons.
- Int J Evol Biol. 2012; 2012: 813015-813015
- Display abstract
Bacterial phylogenies have become one of the most important challenges for microbial ecology. This field started in the mid-1970s with the aim of using the sequence of the small subunit ribosomal RNA (16S) tool to infer bacterial phylogenies. Phylogenetic hypotheses based on other sequences usually give conflicting topologies that reveal different evolutionary histories, which in some cases may be the result of horizontal gene transfer events. Currently, one of the major goals of molecular biology is to understand the role that horizontal gene transfer plays in species adaptation and evolution. In this work, we compared the phylogenetic tree based on 16S with the tree based on dszC, a gene involved in the cleavage of carbon-sulfur bonds. Bacteria of several genera perform this survival task when living in environments lacking free mineral sulfur. The biochemical pathway of the desulphurization process was extensively studied due to its economic importance, since this step is expensive and indispensable in fuel production. Our results clearly show that horizontal gene transfer events could be detected using common phylogenetic methods with gene sequences obtained from public sequence databases.
- Goncearenco A, Berezovsky IN
- Exploring the evolution of protein function in Archaea.
- BMC Evol Biol. 2012; 12: 75-75
- Display abstract
BACKGROUND: Despite recent progress in studies of the evolution of protein function, the questions what were the first functional protein domains and what were their basic building blocks remain unresolved. Previously, we introduced the concept of elementary functional loops (EFLs), which are the functional units of enzymes that provide elementary reactions in biochemical transformations. They are presumably descendants of primordial catalytic peptides. RESULTS: We analyzed distant evolutionary connections between protein functions in Archaea based on the EFLs comprising them. We show examples of the involvement of EFLs in new functional domains, as well as reutilization of EFLs and functional domains in building multidomain structures and protein complexes. CONCLUSIONS: Our analysis of the archaeal superkingdom yields the dominating mechanisms in different periods of protein evolution, which resulted in several levels of the organization of biochemical function. First, functional domains emerged as combinations of prebiotic peptides with the very basic functions, such as nucleotide/phosphate and metal cofactor binding. Second, domain recombination brought to the evolutionary scene the multidomain proteins and complexes. Later, reutilization and de novo design of functional domains and elementary functional loops complemented evolution of protein function.
- Leclere L, Rentzsch F
- Repeated evolution of identical domain architecture in metazoan netrin domain-containing proteins.
- Genome Biol Evol. 2012; 4: 883-99
- Display abstract
The majority of proteins in eukaryotes are composed of multiple domains, and the number and order of these domains is an important determinant of protein function. Although multidomain proteins with a particular domain architecture were initially considered to have a common evolutionary origin, recent comparative studies of protein families or whole genomes have reported that a minority of multidomain proteins could have appeared multiple times independently. Here, we test this scenario in detail for the signaling molecules netrin and secreted frizzled-related proteins (sFRPs), two groups of netrin domain-containing proteins with essential roles in animal development. Our primary phylogenetic analyses suggest that the particular domain architectures of each of these proteins were present in the eumetazoan ancestor and evolved a second time independently within the metazoan lineage from laminin and frizzled proteins, respectively. Using an array of phylogenetic methods, statistical tests, and character sorting analyses, we show that the polyphyly of netrin and sFRP is well supported and cannot be explained by classical phylogenetic reconstruction artifacts. Despite their independent origins, the two groups of netrins and of sFRPs have the same protein interaction partners (Deleted in Colorectal Cancer/neogenin and Unc5 for netrins and Wnts for sFRPs) and similar developmental functions. Thus, these cases of convergent evolution emphasize the importance of domain architecture for protein function by uncoupling shared domain architecture from shared evolutionary history. Therefore, we propose the terms merology to describe the repeated evolution of proteins with similar domain architecture and discuss the potential of merologous proteins to help understanding protein evolution.
- Persi E, Weingart U, Freilich S, Horn D
- Peptide markers of aminoacyl tRNA synthetases facilitate taxa counting in metagenomic data.
- BMC Genomics. 2012; 13: 65-65
- Display abstract
BACKGROUND: Taxa counting is a major problem faced by analysis of metagenomic data. The most popular method relies on analysis of 16S rRNA sequences, but some studies employ also protein based analyses. It would be advantageous to have a method that is applicable directly to short sequences, of the kind extracted from samples in modern metagenomic research. This is achieved by the technique proposed here. RESULTS: We employ specific peptides, deduced from aminoacyl tRNA synthetases, as markers for the occurrence of single genes in data. Sequences carrying these markers are aligned and compared with each other to provide a lower limit for taxa counts in metagenomic data. The method is compared with 16S rRNA searches on a set of known genomes. The taxa counting problem is analyzed mathematically and a heuristic algorithm is proposed. When applied to genomic contigs of a recent human gut microbiome study, the taxa counting method provides information on numbers of different species and strains. We then apply our method to short read data and demonstrate how it can be calibrated to cope with errors. Comparison to known databases leads to estimates of the percentage of novelties, and the type of phyla involved. CONCLUSIONS: A major advantage of our method is its simplicity: it relies on searching sequences for the occurrence of just 4000 specific peptides belonging to the S61 subgroup of aaRS enzymes. When compared to other methods, it provides additional insight into the taxonomic contents of metagenomic data. Furthermore, it can be directly applied to short read data, avoiding the need for genomic contig reconstruction, and taking into account short reads that are otherwise discarded as singletons. Hence it is very suitable for a fast analysis of next generation sequencing data.
- Rybarczyk-Mydlowska K et al.
- Rather than by direct acquisition via lateral gene transfer, GHF5 cellulases were passed on from early Pratylenchidae to root-knot and cyst nematodes.
- BMC Evol Biol. 2012; 12: 221-221
- Display abstract
BACKGROUND: Plant parasitic nematodes are unusual Metazoans as they are equipped with genes that allow for symbiont-independent degradation of plant cell walls. Among the cell wall-degrading enzymes, glycoside hydrolase family 5 (GHF5) cellulases are relatively well characterized, especially for high impact parasites such as root-knot and cyst nematodes. Interestingly, ancestors of extant nematodes most likely acquired these GHF5 cellulases from a prokaryote donor by one or multiple lateral gene transfer events. To obtain insight into the origin of GHF5 cellulases among evolutionary advanced members of the order Tylenchida, cellulase biodiversity data from less distal family members were collected and analyzed. RESULTS: Single nematodes were used to obtain (partial) genomic sequences of cellulases from representatives of the genera Meloidogyne, Pratylenchus, Hirschmanniella and Globodera. Combined Bayesian analysis of approximately 100 cellulase sequences revealed three types of catalytic domains (A, B, and C). Represented by 84 sequences, type B is numerically dominant, and the overall topology of the catalytic domain type shows remarkable resemblance with trees based on neutral (= pathogenicity-unrelated) small subunit ribosomal DNA sequences. Bayesian analysis further suggested a sister relationship between the lesion nematode Pratylenchus thornei and all type B cellulases from root-knot nematodes. Yet, the relationship between the three catalytic domain types remained unclear. Superposition of intron data onto the cellulase tree suggests that types B and C are related, and together distinct from type A that is characterized by two unique introns. CONCLUSIONS: All Tylenchida members investigated here harbored one or multiple GHF5 cellulases. Three types of catalytic domains are distinguished, and the presence of at least two types is relatively common among plant parasitic Tylenchida. Analysis of coding sequences of cellulases suggests that root-knot and cyst nematodes did not acquire this gene directly by lateral genes transfer. More likely, these genes were passed on by ancestors of a family nowadays known as the Pratylenchidae.
- Koonin EV, Wolf YI
- Evolution of microbes and viruses: a paradigm shift in evolutionary biology?
- Front Cell Infect Microbiol. 2012; 2: 119-119
- Display abstract
When Charles Darwin formulated the central principles of evolutionary biology in the Origin of Species in 1859 and the architects of the Modern Synthesis integrated these principles with population genetics almost a century later, the principal if not the sole objects of evolutionary biology were multicellular eukaryotes, primarily animals and plants. Before the advent of efficient gene sequencing, all attempts to extend evolutionary studies to bacteria have been futile. Sequencing of the rRNA genes in thousands of microbes allowed the construction of the three- domain "ribosomal Tree of Life" that was widely thought to have resolved the evolutionary relationships between the cellular life forms. However, subsequent massive sequencing of numerous, complete microbial genomes revealed novel evolutionary phenomena, the most fundamental of these being: (1) pervasive horizontal gene transfer (HGT), in large part mediated by viruses and plasmids, that shapes the genomes of archaea and bacteria and call for a radical revision (if not abandonment) of the Tree of Life concept, (2) Lamarckian-type inheritance that appears to be critical for antivirus defense and other forms of adaptation in prokaryotes, and (3) evolution of evolvability, i.e., dedicated mechanisms for evolution such as vehicles for HGT and stress-induced mutagenesis systems. In the non-cellular part of the microbial world, phylogenomics and metagenomics of viruses and related selfish genetic elements revealed enormous genetic and molecular diversity and extremely high abundance of viruses that come across as the dominant biological entities on earth. Furthermore, the perennial arms race between viruses and their hosts is one of the defining factors of evolution. Thus, microbial phylogenomics adds new dimensions to the fundamental picture of evolution even as the principle of descent with modification discovered by Darwin and the laws of population genetics remain at the core of evolutionary biology.
- Nocek B et al.
- Structural and functional characterization of microcin C resistance peptidase MccF from Bacillus anthracis.
- J Mol Biol. 2012; 420: 366-83
- Display abstract
Microcin C (McC) is heptapeptide adenylate antibiotic produced by Escherichia coli strains carrying the mccABCDEF gene cluster encoding enzymes, in addition to the heptapeptide structural gene mccA, necessary for McC biosynthesis and self-immunity of the producing cell. The heptapeptide facilitates McC transport into susceptible cells, where it is processed releasing a non-hydrolyzable aminoacyl adenylate that inhibits an essential aminoacyl-tRNA synthetase. The self-immunity gene mccF encodes a specialized serine peptidase that cleaves an amide bond connecting the peptidyl or aminoacyl moieties of, respectively, intact and processed McC with the nucleotidyl moiety. Most mccF orthologs from organisms other than E. coli are not linked to the McC biosynthesis gene cluster. Here, we show that a protein product of one such gene, MccF from Bacillus anthracis (BaMccF), is able to cleave intact and processed McC, and we present a series of structures of this protein. Structural analysis of apo-BaMccF and its adenosine monophosphate complex reveals specific features of MccF-like peptidases that allow them to interact with substrates containing nucleotidyl moieties. Sequence analyses and phylogenetic reconstructions suggest that several distinct subfamilies form the MccF clade of the large S66 family of bacterial serine peptidases. We show that various representatives of the MccF clade can specifically detoxify non-hydrolyzable aminoacyl adenylates differing in their aminoacyl moieties. We hypothesize that bacterial mccF genes serve as a source of bacterial antibiotic resistance.
- Sinsheimer JS, Little RJ, Lake JA
- Rooting gene trees without outgroups: EP rooting.
- Genome Biol Evol. 2012; 4: 709-19
- Display abstract
Gene sequences are routinely used to determine the topologies of unrooted phylogenetic trees, but many of the most important questions in evolution require knowing both the topologies and the roots of trees. However, general algorithms for calculating rooted trees from gene and genomic sequences in the absence of gene paralogs are few. Using the principles of evolutionary parsimony (EP) (Lake JA. 1987a. A rate-independent technique for analysis of nucleic acid sequences: evolutionary parsimony. Mol Biol Evol. 4:167-181) and its extensions (Cavender, J. 1989. Mechanized derivation of linear invariants. Mol Biol Evol. 6:301-316; Nguyen T, Speed TP. 1992. A derivation of all linear invariants for a nonbalanced transversion model. J Mol Evol. 35:60-76), we explicitly enumerate all linear invariants that solely contain rooting information and derive algorithms for rooting gene trees directly from gene and genomic sequences. These new EP linear rooting invariants allow one to determine rooted trees, even in the complete absence of outgroups and gene paralogs. EP rooting invariants are explicitly derived for three taxon trees, and rules for their extension to four or more taxa are provided. The method is demonstrated using 18S ribosomal DNA to illustrate how the new animal phylogeny (Aguinaldo AMA et al. 1997. Evidence for a clade of nematodes, arthropods, and other moulting animals. Nature 387:489-493; Lake JA. 1990. Origin of the metazoa. Proc Natl Acad Sci USA 87:763-766) may be rooted directly from sequences, even when they are short and paralogs are unavailable. These results are consistent with the current root (Philippe H et al. 2011. Acoelomorph flatworms are deuterostomes related to Xenoturbella. Nature 470:255-260).
- Yutin N, Koonin EV
- Archaeal origin of tubulin.
- Biol Direct. 2012; 7: 10-10
- Display abstract
Tubulins are a family of GTPases that are key components of the cytoskeleton in all eukaryotes and are distantly related to the FtsZ GTPase that is involved in cell division in most bacteria and many archaea. Among prokaryotes, bona fide tubulins have been identified only in bacteria of the genus Prosthecobacter. These bacterial tubulin genes appear to have been horizontally transferred from eukaryotes. Here we describe tubulins encoded in the genomes of thaumarchaeota of the genus Nitrosoarchaeum that we denote artubulins Phylogenetic analysis results are compatible with the origin of eukaryotic tubulins from artubulins. These findings expand the emerging picture of the origin of key components of eukaryotic functional systems from ancestral forms that are scattered among the extant archaea.
- Hoeppner MP, Gardner PP, Poole AM
- Comparative analysis of RNA families reveals distinct repertoires for each domain of life.
- PLoS Comput Biol. 2012; 8: 1002752-1002752
- Display abstract
The RNA world hypothesis, that RNA genomes and catalysts preceded DNA genomes and genetically-encoded protein catalysts, has been central to models for the early evolution of life on Earth. A key part of such models is continuity between the earliest stages in the evolution of life and the RNA repertoires of extant lineages. Some assessments seem consistent with a diverse RNA world, yet direct continuity between modern RNAs and an RNA world has not been demonstrated for the majority of RNA families, and, anecdotally, many RNA functions appear restricted in their distribution. Despite much discussion of the possible antiquity of RNA families, no systematic analyses of RNA family distribution have been performed. To chart the broad evolutionary history of known RNA families, we performed comparative genomic analysis of over 3 million RNA annotations spanning 1446 families from the Rfam 10 database. We report that 99% of known RNA families are restricted to a single domain of life, revealing discrete repertoires for each domain. For the 1% of RNA families/clans present in more than one domain, over half show evidence of horizontal gene transfer (HGT), and the rest show a vertical trace, indicating the presence of a complex protein synthesis machinery in the Last Universal Common Ancestor (LUCA) and consistent with the evolutionary history of the most ancient protein-coding genes. However, with limited interdomain transfer and few RNA families exhibiting demonstrable antiquity as predicted under RNA world continuity, our results indicate that the majority of modern cellular RNA repertoires have primarily evolved in a domain-specific manner.
- Sallman Almen M, Bringeland N, Fredriksson R, Schioth HB
- The dispanins: a novel gene family of ancient origin that contains 14 human members.
- PLoS One. 2012; 7: 31961-31961
- Display abstract
The Interferon induced transmembrane proteins (IFITM) are a family of transmembrane proteins that is known to inhibit cell invasion of viruses such as HIV-1 and influenza. We show that the IFITM genes are a subfamily in a larger family of transmembrane (TM) proteins that we call Dispanins, which refers to a common 2TM structure. We mined the Dispanins in 36 eukaryotic species, covering all major eukaryotic groups, and investigated their evolutionary history using Bayesian and maximum likelihood approaches to infer a phylogenetic tree. We identified ten human genes that together with the known IFITM genes form the Dispanin family. We show that the Dispanins first emerged in eukaryotes in a common ancestor of choanoflagellates and metazoa, and that the family later expanded in vertebrates where it forms four subfamilies (A-D). Interestingly, we also find that the family is found in several different phyla of bacteria and propose that it was horizontally transferred to eukaryotes from bacteria in the common ancestor of choanoflagellates and metazoa. The bacterial and eukaryotic sequences have a considerably conserved protein structure. In conclusion, we introduce a novel family, the Dispanins, together with a nomenclature based on the evolutionary origin.
- Derelle R, Lang BF
- Rooting the eukaryotic tree with mitochondrial and bacterial proteins.
- Mol Biol Evol. 2012; 29: 1277-89
- Display abstract
By exploiting the large body of genome data and the considerable progress in phylogenetic methodology, recent phylogenomic studies have provided new insights into the relationships among major eukaryotic groups. However, confident placement of the eukaryotic root remains a major challenge. This is due to the large evolutionary distance separating eukaryotes from their closest relatives, the Archaea, implying a weak phylogenetic signal and strong long-branch attraction artifacts. Here, we apply a new approach to the rooting of the eukaryotic tree by using a subset of genomic information with more recent evolutionary origin-mitochondrial sequences, whose closest relatives are alpha-Proteobacteria. For this, we identified and assembled a data set of 42 mitochondrial proteins (mainly encoded by the nuclear genome) and performed Bayesian and maximum likelihood analyses. Taxon sampling includes the recently sequenced Thecamonas trahens, a member of the phylogenetically elusive Apusozoa. This data set confirms the relationships of several eukaryotic supergroups seen before and places the eukaryotic root between the monophyletic "unikonts" and "bikonts." We further show that T. trahens branches sister to Opisthokonta with significant statistical support and question the bikont/excavate affiliation of Malawimonas species. The mitochondrial data set developed here (to be expanded in the future) constitutes a unique alternative means in resolving deep eukaryotic relationships.
- Takeuchi N, Wolf YI, Makarova KS, Koonin EV
- Nature and intensity of selection pressure on CRISPR-associated genes.
- J Bacteriol. 2012; 194: 1216-25
- Display abstract
The recently discovered CRISPR-Cas adaptive immune system is present in almost all archaea and many bacteria. It consists of cassettes of CRISPR repeats that incorporate spacers homologous to fragments of viral or plasmid genomes that are employed as guide RNAs in the immune response, along with numerous CRISPR-associated (cas) genes that encode proteins possessing diverse, only partially characterized activities required for the action of the system. Here, we investigate the evolution of the cas genes and show that they evolve under purifying selection that is typically much weaker than the median strength of purifying selection affecting genes in the respective genomes. The exceptions are the cas1 and cas2 genes that typically evolve at levels of purifying selection close to the genomic median. Thus, although these genes are implicated in the acquisition of spacers from alien genomes, they do not appear to be directly involved in an arms race between bacterial and archaeal hosts and infectious agents. These genes might possess functions distinct from and additional to their role in the CRISPR-Cas-mediated immune response. Taken together with evidence of the frequent horizontal transfer of cas genes reported previously and with the wide-spread microscale recombination within these genes detected in this work, these findings reveal the highly dynamic evolution of cas genes. This conclusion is in line with the involvement of CRISPR-Cas in antiviral immunity that is likely to entail a coevolutionary arms race with rapidly evolving viruses. However, we failed to detect evidence of strong positive selection in any of the cas genes.
- Mattoo S et al.
- Comparative analysis of Histophilus somni immunoglobulin-binding protein A (IbpA) with other fic domain-containing enzymes reveals differences in substrate and nucleotide specificities.
- J Biol Chem. 2011; 286: 32834-42
- Display abstract
A new family of adenylyltransferases, defined by the presence of a Fic domain, was recently discovered to catalyze the addition of adenosine monophosphate (AMP) to Rho GTPases (Yarbrough, M. L., Li, Y., Kinch, L. N., Grishin, N. V., Ball, H. L., and Orth, K. (2009) Science 323, 269-272; Worby, C. A., Mattoo, S., Kruger, R. P., Corbeil, L. B., Koller, A., Mendez, J. C., Zekarias, B., Lazar, C., and Dixon, J. E. (2009) Mol. Cell 34, 93-103). This adenylylation event inactivates Rho GTPases by preventing them from binding to their downstream effectors. We reported that the Fic domain(s) of the immunoglobulin-binding protein A (IbpA) from the pathogenic bacterium Histophilus somni adenylylates mammalian Rho GTPases, RhoA, Rac1, and Cdc42, thereby inducing host cytoskeletal collapse, which allows H. somni to breach alveolar barriers and cause septicemia. The IbpA-mediated adenylylation occurs on a functionally critical tyrosine in the switch 1 region of these GTPases. Here, we conduct a detailed characterization of the IbpA Fic2 domain and compare its activity with other known Fic adenylyltransferases, VopS (Vibrio outer protein S) from the bacterial pathogen Vibrio parahaemolyticus and the human protein HYPE (huntingtin yeast interacting protein E; also called FicD). We also included the Fic domains of the secreted protein, PfhB2, from the opportunistic pathogen Pasteurella multocida, in our analysis. PfhB2 shares a common domain architecture with IbpA and contains two Fic domains. We demonstrate that the PfhB2 Fic domains also possess adenylyltransferase activity that targets the switch 1 tyrosine of Rho GTPases. Comparative kinetic and phylogenetic analyses of IbpA-Fic2 with the Fic domains of PfhB2, VopS, and HYPE reveal important aspects of their specificities for Rho GTPases and nucleotide usage and offer mechanistic insights for determining nucleotide and substrate specificities for these enzymes. Finally, we compare the evolutionary lineages of Fic proteins with those of other known adenylyltransferases.
- Zmasek CM, Godzik A
- Strong functional patterns in the evolution of eukaryotic genomes revealed by the reconstruction of ancestral protein domain repertoires.
- Genome Biol. 2011; 12: 4-4
- Display abstract
BACKGROUND: Genome size and complexity, as measured by the number of genes or protein domains, is remarkably similar in most extant eukaryotes and generally exhibits no correlation with their morphological complexity. Underlying trends in the evolution of the functional content and capabilities of different eukaryotic genomes might be hidden by simultaneous gains and losses of genes. RESULTS: We reconstructed the domain repertoires of putative ancestral species at major divergence points, including the last eukaryotic common ancestor (LECA). We show that, surprisingly, during eukaryotic evolution domain losses in general outnumber domain gains. Only at the base of the animal and the vertebrate sub-trees do domain gains outnumber domain losses. The observed gain/loss balance has a distinct functional bias, most strikingly seen during animal evolution, where most of the gains represent domains involved in regulation and most of the losses represent domains with metabolic functions. This trend is so consistent that clustering of genomes according to their functional profiles results in an organization similar to the tree of life. Furthermore, our results indicate that metabolic functions lost during animal evolution are likely being replaced by the metabolic capabilities of symbiotic organisms such as gut microbes. CONCLUSIONS: While protein domain gains and losses are common throughout eukaryote evolution, losses oftentimes outweigh gains and lead to significant differences in functional profiles. Results presented here provide additional arguments for a complex last eukaryotic common ancestor, but also show a general trend of losses in metabolic capabilities and gain in regulatory complexity during the rise of animals.
- Larson ET et al.
- Structure of Leishmania major methionyl-tRNA synthetase in complex with intermediate products methionyladenylate and pyrophosphate.
- Biochimie. 2011; 93: 570-82
- Display abstract
Leishmania parasites cause two million new cases of leishmaniasis each year with several hundreds of millions of people at risk. Due to the paucity and shortcomings of available drugs, we have undertaken the crystal structure determination of a key enzyme from Leishmania major in hopes of creating a platform for the rational design of new therapeutics. Crystals of the catalytic core of methionyl-tRNA synthetase from L. major (LmMetRS) were obtained with the substrates MgATP and methionine present in the crystallization medium. These crystals yielded the 2.0 A resolution structure of LmMetRS in complex with two products, methionyladenylate and pyrophosphate, along with a Mg(2+) ion that bridges them. This is the first class I aminoacyl-tRNA synthetase (aaRS) structure with pyrophosphate bound. The residues of the class I aaRS signature sequence motifs, KISKS and HIGH, make numerous contacts with the pyrophosphate. Substantial differences between the LmMetRS structure and previously reported complexes of Escherichia coli MetRS (EcMetRS) with analogs of the methionyladenylate intermediate product are observed, even though one of these analogs only differs by one atom from the intermediate. The source of these structural differences is attributed to the presence of the product pyrophosphate in LmMetRS. Analysis of the LmMetRS structure in light of the Aquifex aeolicus MetRS-tRNA(Met) complex shows that major rearrangements of multiple structural elements of enzyme and/or tRNA are required to allow the CCA acceptor triplet to reach the methionyladenylate intermediate in the active site. Comparison with sequences of human cytosolic and mitochondrial MetRS reveals interesting differences near the ATP- and methionine-binding regions of LmMetRS, suggesting that it should be possible to obtain compounds that selectively inhibit the parasite enzyme.
- Atkinson GC, Tenson T, Hauryliuk V
- The RelA/SpoT homolog (RSH) superfamily: distribution and functional evolution of ppGpp synthetases and hydrolases across the tree of life.
- PLoS One. 2011; 6: 23479-23479
- Display abstract
RelA/SpoT Homologue (RSH) proteins, named for their sequence similarity to the RelA and SpoT enzymes of Escherichia coli, comprise a superfamily of enzymes that synthesize and/or hydrolyze the alarmone ppGpp, activator of the "stringent" response and regulator of cellular metabolism. The classical "long" RSHs Rel, RelA and SpoT with the ppGpp hydrolase, synthetase, TGS and ACT domain architecture have been found across diverse bacteria and plant chloroplasts, while dedicated single domain ppGpp-synthesizing and -hydrolyzing RSHs have also been discovered in disparate bacteria and animals respectively. However, there is considerable confusion in terms of nomenclature and no comprehensive phylogenetic and sequence analyses have previously been carried out to classify RSHs on a genomic scale. We have performed high-throughput sensitive sequence searching of over 1000 genomes from across the tree of life, in combination with phylogenetic analyses to consolidate previous ad hoc identification of diverse RSHs in different organisms and provide a much-needed unifying terminology for the field. We classify RSHs into 30 subgroups comprising three groups: long RSHs, small alarmone synthetases (SASs), and small alarmone hydrolases (SAHs). Members of nineteen previously unidentified RSH subgroups can now be studied experimentally, including previously unknown RSHs in archaea, expanding the "stringent response" to this domain of life. We have analyzed possible combinations of RSH proteins and their domains in bacterial genomes and compared RSH content with available RSH knock-out data for various organisms to determine the rules of combining RSHs. Through comparative sequence analysis of long and small RSHs, we find exposed sites limited in conservation to the long RSHs that we propose are involved in transmitting regulatory signals. Such signals may be transmitted via NTD to CTD intra-molecular interactions, or inter-molecular interactions either among individual RSH molecules or among long RSHs and other binding partners such as the ribosome.
- Cohen-Gihon I, Fong JH, Sharan R, Nussinov R, Przytycka TM, Panchenko AR
- Evolution of domain promiscuity in eukaryotic genomes--a perspective from the inferred ancestral domain architectures.
- Mol Biosyst. 2011; 7: 784-92
- Display abstract
Most eukaryotic proteins are composed of two or more domains. These assemble in a modular manner to create new proteins usually by the acquisition of one or more domains to an existing protein. Promiscuous domains which are found embedded in a variety of proteins and co-exist with many other domains are of particular interest and were shown to have roles in signaling pathways and mediating network communication. The evolution of domain promiscuity is still an open problem, mostly due to the lack of sequenced ancestral genomes. Here we use inferred domain architectures of ancestral genomes to trace the evolution of domain promiscuity in eukaryotic genomes. We find an increase in average promiscuity along many branches of the eukaryotic tree. Moreover, domain promiscuity can proceed at almost a steady rate over long evolutionary time or exhibit lineage-specific acceleration. We also observe that many signaling and regulatory domains gained domain promiscuity around the Bilateria divergence. In addition we show that those domains that played a role in the creation of two body axes and existed before the divergence of the bilaterians from fungi/metazoan achieve a boost in their promiscuities during the bilaterian evolution.
- Huson DH, Scornavacca C
- A survey of combinatorial methods for phylogenetic networks.
- Genome Biol Evol. 2011; 3: 23-35
- Display abstract
The evolutionary history of a set of species is usually described by a rooted phylogenetic tree. Although it is generally undisputed that bifurcating speciation events and descent with modifications are major forces of evolution, there is a growing belief that reticulate events also have a role to play. Phylogenetic networks provide an alternative to phylogenetic trees and may be more suitable for data sets where evolution involves significant amounts of reticulate events, such as hybridization, horizontal gene transfer, or recombination. In this article, we give an introduction to the topic of phylogenetic networks, very briefly describing the fundamental concepts and summarizing some of the most important combinatorial methods that are available for their computation.
- Aravind L, Abhiman S, Iyer LM
- Natural history of the eukaryotic chromatin protein methylation system.
- Prog Mol Biol Transl Sci. 2011; 101: 105-76
- Display abstract
In eukaryotes, methylation of nucleosomal histones and other nuclear proteins is a central aspect of chromatin structure and dynamics. The past 15 years have seen an enormous advance in our understanding of the biochemistry of these modifications, and of their role in establishing the epigenetic code. We provide a synthetic overview, from an evolutionary perspective, of the main players in the eukaryotic chromatin protein methylation system, with an emphasis on catalytic domains. Several components of the eukaryotic protein methylation system had their origins in bacteria. In particular, the Rossmann fold protein methylases (PRMTs and DOT1), and the LSD1 and jumonji-related demethylases and oxidases, appear to have emerged in the context of bacterial peptide methylation and hydroxylation systems. These systems were originally involved in synthesis of peptide secondary metabolites, such as antibiotics, toxins, and siderophores. The peptidylarginine deiminases appear to have been acquired by animals from bacterial enzymes that modify cell-surface proteins. SET domain methylases, which display the beta-clip fold, apparently first emerged in prokaryotes from the SAF superfamily of carbohydrate-binding domains. However, even in bacteria, a subset of the SET domains might have evolved a chromatin-related role in conjunction with a BAF60a/b-like SWIB domain protein and topoisomerases. By the time of the last eukaryotic common ancestor, multiple SET and PRMT methylases were already in place and are likely to have mediated methylation at the H3K4, H3K9, H3K36, and H4K20 positions, and carried out both asymmetric and symmetric arginine dimethylation. Inference of H3K27 methylation in the ancestral eukaryote appears uncertain, though it was certainly in place a little later in eukaryotic evolution. Current data suggest that unlike SET methylases, which are universally present in eukaryotes, demethylases are not. They appear to be absent in the earliest-branching eukaryotic lineages, and emerged later along with several other chromatin proteins, such as the Dot1-methylase, prior to divergence of the kinetoplastid-heterolobosean lineage from the remaining eukaryotes. This period also corresponds to the point of origin of DNA cytosine methylation by DNMT1. Origin of major lineages of SET domains such as the Trithorax, Su(var)3-9, Ash1, SMYD, and TTLL12 and E(Z) might have played the initial role in the establishment of multiple distinct heterochromatic and euchromatic states that are likely to have been present, in some form, through much of eukaryotic evolution. Elaboration of these chromatin states might have gone hand-in-hand with acquisition of multiple jumonji-related and LSD1-like demethylases, and functional linkages with the DNA methylation and RNAi systems. Throughout eukaryotic evolution, there were several lineage-specific expansions of SET domain proteins, which might be related to a special transcription regulation process in trypanosomes, acquisition of new meiotic recombination hotspots in animals, and methylation and associated modifications of the diatom silaffin proteins involved in silica biomineralization. The use of specific domains to "read" the methylation marks appears to have been present in the ancestral eukaryote itself. Of these the chromo-like domains appear to have been acquired from bacterial secreted proteins that might have a role in binding cell-surface peptides or peptidoglycan. Domain architectures of the primary enzymes involved in the eukaryotic protein methylation system indicate key features relating to interactions with each other and other modifications in chromatin, such as acetylation. They also emphasize the profound functional distinction between the role of demethylation and deacetylation in regulation of chromatin dynamics.
- Bilewitch JP, Degnan SM
- A unique horizontal gene transfer event has provided the octocoral mitochondrial genome with an active mismatch repair gene that has potential for an unusual self-contained function.
- BMC Evol Biol. 2011; 11: 228-228
- Display abstract
BACKGROUND: The mitochondrial genome of the Octocorallia has several characteristics atypical for metazoans, including a novel gene suggested to function in DNA repair. This mtMutS gene is favored for octocoral molecular systematics, due to its high information content. Several hypotheses concerning the origins of mtMutS have been proposed, and remain equivocal, although current weight of support is for a horizontal gene transfer from either an epsilonproteobacterium or a large DNA virus. Here we present new and compelling evidence on the evolutionary origin of mtMutS, and provide the very first data on its activity, functional capacity and stability within the octocoral mitochondrial genome. RESULTS: The mtMutS gene has the expected conserved amino acids, protein domains and predicted tertiary protein structure. Phylogenetic analysis indicates that mtMutS is not a member of the MSH family and therefore not of eukaryotic origin. MtMutS clusters closely with representatives of the MutS7 lineage; further support for this relationship derives from the sharing of a C-terminal endonuclease domain that confers a self-contained mismatch repair function. Gene expression analyses confirm that mtMutS is actively transcribed in octocorals. Rates of mitochondrial gene evolution in mtMutS-containing octocorals are lower than in their hexacoral sister-group, which lacks the gene, although paradoxically the mtMutS gene itself has higher rates of mutation than other octocoral mitochondrial genes. CONCLUSIONS: The octocoral mtMutS gene is active and codes for a protein with all the necessary components for DNA mismatch repair. A lower rate of mitochondrial evolution, and the presence of a nicking endonuclease domain, both indirectly support a theory of self-sufficient DNA mismatch repair within the octocoral mitochondrion. The ancestral affinity of mtMutS to non-eukaryotic MutS7 provides compelling support for an origin by horizontal gene transfer. The immediate vector of transmission into octocorals can be attributed to either an epsilonproteobacterium in an endosymbiotic association or to a viral infection, although DNA viruses are not currently known to infect both bacteria and eukaryotes, nor mitochondria in particular. In consolidating the first known case of HGT into an animal mitochondrial genome, these findings suggest the need for reconsideration of the means by which metazoan mitochondrial genomes evolve.
- Williams TA, Embley TM, Heinz E
- Informational gene phylogenies do not support a fourth domain of life for nucleocytoplasmic large DNA viruses.
- PLoS One. 2011; 6: 21080-21080
- Display abstract
Mimivirus is a nucleocytoplasmic large DNA virus (NCLDV) with a genome size (1.2 Mb) and coding capacity ( 1000 genes) comparable to that of some cellular organisms. Unlike other viruses, Mimivirus and its NCLDV relatives encode homologs of broadly conserved informational genes found in Bacteria, Archaea, and Eukaryotes, raising the possibility that they could be placed on the tree of life. A recent phylogenetic analysis of these genes showed the NCLDVs emerging as a monophyletic group branching between Eukaryotes and Archaea. These trees were interpreted as evidence for an independent "fourth domain" of life that may have contributed DNA processing genes to the ancestral eukaryote. However, the analysis of ancient evolutionary events is challenging, and tree reconstruction is susceptible to bias resulting from non-phylogenetic signals in the data. These include compositional heterogeneity and homoplasy, which can lead to the spurious grouping of compositionally-similar or fast-evolving sequences. Here, we show that these informational gene alignments contain both significant compositional heterogeneity and homoplasy, which were not adequately modelled in the original analysis. When we use more realistic evolutionary models that better fit the data, the resulting trees are unable to reject a simple null hypothesis in which these informational genes, like many other NCLDV genes, were acquired by horizontal transfer from eukaryotic hosts. Our results suggest that a fourth domain is not required to explain the available sequence data.
- Fournier GP, Dick AA, Williams D, Gogarten JP
- Evolution of the Archaea: emerging views on origins and phylogeny.
- Res Microbiol. 2011; 162: 92-8
- Display abstract
Of the three domains of life, the Archaea are the most recently discovered and, from the perspective of systematics, perhaps the least understood. More than three decades after their discovery, there is still no overwhelming consensus as to their phylogenetic status, with diverse evidence supporting in varying degrees their monophyly, paraphyly, or even polyphyly. As a further complication, their evolutionary history is inextricably linked to the origin of Eukarya, one of the most challenging problems in evolutionary biology. This exclusive relationship between the eukaryal nucleocytoplasm and the Archaea is further supported by a new methodology for rooting the ribosomal Tree of Life based on amino acid composition. Novel approaches such as utilizing horizontal gene transfers as synchronizing events and branch length analysis of deep paralogs will help to clarify temporal relationships between these lineages, and may prove useful in evaluating the numerous conflicting hypotheses related to the evolution of the Archaea and Eukarya.
- Kunisawa T
- Inference of the phylogenetic position of the phylum Deferribacteres from gene order comparison.
- Antonie Van Leeuwenhoek. 2011; 99: 417-22
- Display abstract
The phylogenetic placement of the phylum Deferribacteres was investigated on the basis of gene order comparisons of completely sequenced bacterial genomes. Two completely sequenced Deferribacteres species share five sets of gene arrangements with a group of phyla, Proteobacteria, Aquificae, Planctomycetes, Spirochaetes, Bacteroidetes, Chlorobi, Acidobacteria, Verrucomicrobia, Elusimicrobia and Nitrospirae, while the other group of phyla, Synergistetes, Firmicutes, Actinobacteria, Thermotogae, Chloroflexi and Deinococcus-Thermus, Fusobacteria, shares alternative sets of gene arrangements, suggesting that the Deferribacteres is classified in the former group of phyla. Gene transfers that are thought to have occurred in a common ancestor of the Deferribacteres, Deltaproteobacteria and Nitrospirae exclusive of virtually all other phyla were identified, which suggests that the Deferribacteres is phylogenetically proximal to the Proteobacteria and Nitrospirae.
- Larson ET et al.
- The double-length tyrosyl-tRNA synthetase from the eukaryote Leishmania major forms an intrinsically asymmetric pseudo-dimer.
- J Mol Biol. 2011; 409: 159-76
- Display abstract
The single tyrosyl-tRNA synthetase (TyrRS) gene in trypanosomatid genomes codes for a protein that is twice the length of TyrRS from virtually all other organisms. Each half of the double-length TyrRS contains a catalytic domain and an anticodon-binding domain; however, the two halves retain only 17% sequence identity to each other. The structural and functional consequences of this duplication and divergence are unclear. TyrRS normally forms a homodimer in which the active site of one monomer pairs with the anticodon-binding domain from the other. However, crystal structures of Leishmania major TyrRS show that, instead, the two halves of a single molecule form a pseudo-dimer resembling the canonical TyrRS dimer. Curiously, the C-terminal copy of the catalytic domain has lost the catalytically important HIGH and KMSKS motifs characteristic of class I aminoacyl-tRNA synthetases. Thus, the pseudo-dimer contains only one functional active site (contributed by the N-terminal half) and only one functional anticodon recognition site (contributed by the C-terminal half). Despite biochemical evidence for negative cooperativity between the two active sites of the usual TyrRS homodimer, previous structures have captured a crystallographically-imposed symmetric state. As the L. major TyrRS pseudo-dimer is inherently asymmetric, conformational variations observed near the active site may be relevant to understanding how the state of a single active site is communicated across the dimer interface. Furthermore, substantial differences between trypanosomal TyrRS and human homologs are promising for the design of inhibitors that selectively target the parasite enzyme.
- Zhang D, Aravind L
- Identification of novel families and classification of the C2 domain superfamily elucidate the origin and evolution of membrane targeting activities in eukaryotes.
- Gene. 2010; 469: 18-30
- Display abstract
Eukaryotes contain an elaborate membrane system, which bounds the cell itself, nuclei, organelles and transient intracellular structures, such as vesicles. The emergence of this system was marked by an expansion of a number of structurally distinct classes of lipid-binding domains that could throw light on the early evolution of eukaryotic membranes. The C2 domain is a useful model to understand these events because it is one of the most prevalent eukaryotic lipid-binding domains deployed in diverse functional contexts. Most studies have concentrated on C2 domains prototyped by those in protein kinase C (PKC-C2) isoforms that bind lipid in a calcium-dependent manner. While two other distinct families of C2 domains, namely those in PI3K-C2 and PTEN-C2 are also recognized, a complete picture of evolutionary relationships within the C2 domain superfamily is lacking. We systematically studied this superfamily using sequence profile searches, phylogenetic and phyletic-pattern analysis and structure-prediction. Consequently, we identified several distinct families of C2 domains including those respectively typified by C2 domains in the Aida (axin interactor, dorsalization associated) proteins, B9 proteins (e.g. Mks1 (Xbx-7), Stumpy (Tza-1) and Tza-2) involved in centrosome migration and ciliogenesis, Dock180/Zizimin proteins which are Rac/CDC42 GDP exchange factors, the EEIG1/Sym-3, EHBP1 and plant RPG/PMI1 proteins involved in endocytotic recycling and organellar positioning and an apicomplexan family. We present evidence that the last eukaryotic common ancestor (LECA) contained at least 10 C2 domains belonging to 6 well-defined families. Further, we suggest that this pre-LECA diversification was linked to the emergence of several quintessentially eukaryotic structures, such as membrane repair and vesicular trafficking system, anchoring of the actin and tubulin cytoskeleton to the plasma and vesicular membranes, localization of small GTPases to membranes and lipid-based signal transduction. Subsequent lineage-specific expansions of Zizimin-type C2 domains and functionally linked CDC42/Rac GTPases occurred independently in eukaryotes that evolved active amoeboid motility. While two lipid-binding regions are likely to be shared by majority of C2 domains, the actual constellation of lipid-binding residues (predominantly basic) are distinct in each family potentially reflective of the functional and biochemical diversity of these domains. Importantly, we show that the calcium-dependent membrane interaction is a derived feature limited to the PKC-C2 domains. Our identification of novel C2 domains offers new insights into interaction between both the microtubular and microfilament cytoskeleton and cellular membranes.
- Gawryluk RM, Gray MW
- An ancient fission of mitochondrial Cox1.
- Mol Biol Evol. 2010; 27: 7-10
- Display abstract
Many genes inherited from the alpha-proteobacterial ancestor of mitochondria have undergone evolutionary transfer to the nuclear genome in eukaryotes. In some rare cases, genes have been functionally transferred in pieces, resulting in split proteins that presumably interact in trans within mitochondria, fulfilling the same role as the ancestral, intact protein. We describe a nucleus-encoded mitochondrial protein (here named Cox1-c) in the amoeboid protist Acanthamoeba castellanii that is homologous to the C-terminal portion of conventional mitochondrial Cox1, whereas the corresponding portion of the mitochondrion-encoded A. castellanii Cox1 is absent. Bioinformatics searches retrieved nucleus-encoded Cox1-c homologs in most major eukaryotic supergroups; in these cases, also, the mitochondrion-encoded Cox1 lacks the corresponding C-terminal motif. These data constitute the first report of functional relocation of a portion of cox1 to the nucleus. This transfer event was likely ancient, with the resulting nuclear cox1-c being differentially activated across the eukaryotic domain.
- Krupovic M, Gribaldo S, Bamford DH, Forterre P
- The evolutionary history of archaeal MCM helicases: a case study of vertical evolution combined with hitchhiking of mobile genetic elements.
- Mol Biol Evol. 2010; 27: 2716-32
- Display abstract
Genes encoding DNA replication proteins have been frequently exchanged between cells and mobile elements, such as viruses or plasmids. This raises potential problems to reconstruct their history. Here, we combine phylogenetic and genomic context analyses to study the evolution of the replicative minichromosome maintenance (MCM) helicases in Archaea. Several archaeal genomes encode more than one copy of the mcm gene. Genome context analysis reveals that most of these additional copies are encoded within mobile elements. Exhaustive analysis of these elements reveals diverse groups of integrated archaeal plasmids or viruses, including several head-and-tail proviruses. Some MCMs encoded by mobile elements are structurally distinct from their cellular counterparts, with one case of novel domain organization. Both genome context and phylogenetic analysis indicate that MCM encoded by mobile elements were recruited from cellular genomes. An accelerated evolution and a dramatic expansion of methanococcal MCMs suggest a host-to-virus-to-host transfer loop, possibly triggered by the loss of the archaeal initiator protein Cdc6 in Methanococcales. Surprisingly, despite extensive transfer of mcm genes between viruses, plasmids, and cells, the topology of the MCM tree is strikingly congruent with the consensus archaeal phylogeny, indicating that mobile elements encoding mcm have coevolved with their hosts and that DNA replication proteins can be also useful to reconstruct the history of the archaeal domain.
- Puigbo P, Wolf YI, Koonin EV
- The tree and net components of prokaryote evolution.
- Genome Biol Evol. 2010; 2: 745-56
- Display abstract
Phylogenetic trees of individual genes of prokaryotes (archaea and bacteria) generally have different topologies, largely owing to extensive horizontal gene transfer (HGT), suggesting that the Tree of Life (TOL) should be replaced by a "net of life" as the paradigm of prokaryote evolution. However, trees remain the natural representation of the histories of individual genes given the fundamentally bifurcating process of gene replication. Therefore, although no single tree can fully represent the evolution of prokaryote genomes, the complete picture of evolution will necessarily combine trees and nets. A quantitative measure of the signals of tree and net evolution is derived from an analysis of all quartets of species in all trees of the "Forest of Life" (FOL), which consists of approximately 7,000 phylogenetic trees for prokaryote genes including approximately 100 nearly universal trees (NUTs). Although diverse routes of net-like evolution collectively dominate the FOL, the pattern of tree-like evolution that reflects the consistent topologies of the NUTs is the most prominent coherent trend. We show that the contributions of tree-like and net-like evolutionary processes substantially differ across bacterial and archaeal lineages and between functional classes of genes. Evolutionary simulations indicate that the central tree-like signal cannot be realistically explained by a self-reinforcing pattern of biased HGT.
- Fujishima K, Sugahara J, Tomita M, Kanai A
- Large-scale tRNA intron transposition in the archaeal order Thermoproteales represents a novel mechanism of intron gain.
- Mol Biol Evol. 2010; 27: 2233-43
- Display abstract
Recently, diverse arrangements of transfer RNA (tRNA) genes have been found in the domain Archaea, in which the tRNA is interrupted by a maximum of three introns or is even fragmented into two or three genes. Whereas most of the eukaryotic tRNA introns are inserted strictly at the canonical nucleotide position (37/38), archaeal intron-containing tRNAs have a wide diversity of small tRNA introns, which differ in their numbers and locations. This feature is especially pronounced in the archaeal order Thermoproteales. In this study, we performed a comprehensive sequence comparison of 286 tRNA introns and their genes in seven Thermoproteales species to clarify how these introns have emerged and diversified during tRNA gene evolution. We identified 46 intron groups containing sets of highly similar sequences (>70%) and showed that 16 of them contain sequences from evolutionarily distinct tRNA genes. The phylogeny of these 16 intron groups indicates that transposition events have occurred at least seven times throughout the evolution of Thermoproteales. These findings suggest that frequent intron transposition occurs among the tRNA genes of Thermoproteales. Further computational analysis revealed limited insertion positions and corresponding amino acid types of tRNA genes. This has arisen because the bulge-helix-bulge splicing motif is required at the newly transposed position if the pre-tRNA is to be correctly processed. These results clearly demonstrate a newly identified mechanism that facilitates the late gain of short introns at various noncanonical positions in archaeal tRNAs.
- Boyer M, Madoui MA, Gimenez G, La Scola B, Raoult D
- Phylogenetic and phyletic studies of informational genes in genomes highlight existence of a 4 domain of life including giant viruses.
- PLoS One. 2010; 5: 15530-15530
- Display abstract
The discovery of Mimivirus, with its very large genome content, made it possible to identify genes common to the three domains of life (Eukarya, Bacteria and Archaea) and to generate controversial phylogenomic trees congruent with that of ribosomal genes, branching Mimivirus at its root. Here we used sequences from metagenomic databases, Marseillevirus and three new viruses extending the Mimiviridae family to generate the phylogenetic trees of eight proteins involved in different steps of DNA processing. Compared to the three ribosomal defined domains, we report a single common origin for Nucleocytoplasmic Large DNA Viruses (NCLDV), DNA processing genes rooted between Archaea and Eukarya, with a topology congruent with that of the ribosomal tree. As for translation, we found in our new viruses, together with Mimivirus, five proteins rooted deeply in the eukaryotic clade. In addition, comparison of informational genes repertoire based on phyletic pattern analysis supports existence of a clade containing NCLDVs clearly distinct from that of Eukarya, Bacteria and Archaea. We hypothesize that the core genome of NCLDV is as ancient as the three currently accepted domains of life.
- Projecto-Garcia J, Zorn N, Jollivet D, Schaeffer SW, Lallier FH, Hourdez S
- Origin and evolution of the unique tetra-domain hemoglobin from the hydrothermal vent scale worm Branchipolynoe.
- Mol Biol Evol. 2010; 27: 143-52
- Display abstract
Hemoglobin is the most common respiratory pigment in annelids. It can be intra or extracellular, and this latter type can form large multimeric complexes. The hydrothermal vent scale worms Branchipolynoe symmytilida and Branchipolynoe seepensis express an extracellular tetra-domain hemoglobin (Hb) that is unique in annelids. We sequenced the gene for the single-domain and tetra-domain globins in these two species. The single-domain gene codes for a mature protein of 137 amino acids, and the tetra-domain gene codes for a mature protein of 552 amino acids. The single-domain gene has a typical three exon/two intron structure, with introns located at their typical positions (B12.2 and G7.0). This structure is repeated four times in the tetra-domain gene, with no bridge introns or linker sequences between domains. The phylogenetic position of Branchipolynoe globins among known annelid globins revealed that, although extracellular, they cluster within the annelid intracellular globins clade, suggesting that the extracellular state of these Hbs is the result of convergent evolution. The tetra-domain structure likely resulted from two tandem duplications, domain 1 giving rise to domain 2 and after this the two-domain gene duplicated to produce domains 3 and 4. The high O(2) affinity of Branchipolynoe extracellular globins may be explained by the two key residues (B10Y and E7Q) in the heme pocket in each of the domains of the single and tetra-domain globins, which have been shown to be essential in the oxygen-avid Hb from the nematode Ascaris suum. This peculiar globin evolutionary path seems to be very different from other annelid extracellular globins and is most likely the product of evolutionary tinkering associated with the strong selective pressure to adapt to chronic hypoxia that characterizes hydrothermal vents.
- Nacher JC, Hayashida M, Akutsu T
- The role of internal duplication in the evolution of multi-domain proteins.
- Biosystems. 2010; 101: 127-35
- Display abstract
Many proteins consist of several structural domains. These multi-domain proteins have likely been generated by selective genome growth dynamics during evolution to perform new functions as well as to create structures that fold on a biologically feasible time scale. Domain units frequently evolved through a variety of genetic shuffling mechanisms. Here we examine the protein domain statistics of more than 1000 organisms including eukaryotic, archaeal and bacterial species. The analysis extends earlier findings on asymmetric statistical laws for proteome to a wider variety of species. While proteins are composed of a wide range of domains, displaying a power-law decay, the computation of domain families for each protein reveals an exponential distribution, characterizing a protein universe composed of a thin number of unique families. Structural studies in proteomics have shown that domain repeats, or internal duplicated domains, represent a small but significant fraction of genome. In spite of its importance, this observation has been largely overlooked until recently. We model the evolutionary dynamics of proteome and demonstrate that these distinct distributions are in fact rooted in an internal duplication mechanism. This process generates the contemporary protein structural domain universe, determines its reduced thickness, and tames its growth. These findings have important implications, ranging from protein interaction network modeling to evolutionary studies based on fundamental mechanisms governing genome expansion.
- Cocquyt E, Verbruggen H, Leliaert F, De Clerck O
- Evolution and cytological diversification of the green seaweeds (Ulvophyceae).
- Mol Biol Evol. 2010; 27: 2052-61
- Display abstract
The Ulvophyceae, one of the four classes of the Chlorophyta, is of particular evolutionary interest because it features an unrivaled morphological and cytological diversity. Morphological types range from unicells and simple multicellular filaments to sheet-like and complex corticated thalli. Cytological layouts range from typical small cells containing a single nucleus and chloroplast to giant cells containing millions of nuclei and chloroplasts. In order to understand the evolution of these morphological and cytological types, the present paper aims to assess whether the Ulvophyceae are monophyletic and elucidate the ancient relationships among its orders. Our approach consists of phylogenetic analyses (maximum likelihood and Bayesian inference) of seven nuclear genes, small subunit nuclear ribosomal DNA and two plastid markers with carefully chosen partitioning strategies, and models of sequence evolution. We introduce a procedure for fast site removal (site stripping) targeted at improving phylogenetic signal in a particular epoch of interest and evaluate the specificity of fast site removal to retain signal about ancient relationships. From our phylogenetic analyses, we conclude that the ancestral ulvophyte likely was a unicellular uninucleate organism and that macroscopic growth was achieved independently in various lineages involving radically different mechanisms: either by evolving multicellularity with coupled mitosis and cytokinesis (Ulvales-Ulotrichales and Trentepohliales), by obtaining a multinucleate siphonocladous organization where every nucleus provides for its own cytoplasmic domain (Cladophorales and Blastophysa), or by developing a siphonous organization characterized by either one macronucleus or millions of small nuclei and cytoplasmic streaming (Bryopsidales and Dasycladales). We compare different evolutionary scenarios giving rise to siphonous and siphonocladous cytologies and argue that these did not necessarily evolve from a multicellular or even multinucleate state but instead could have evolved independently from a unicellular ancestor.
- Mallet LV, Becq J, Deschavanne P
- Whole genome evaluation of horizontal transfers in the pathogenic fungus Aspergillus fumigatus.
- BMC Genomics. 2010; 11: 171-171
- Display abstract
BACKGROUND: Numerous cases of horizontal transfers (HTs) have been described for eukaryote genomes, but in contrast to prokaryote genomes, no whole genome evaluation of HTs has been carried out. This is mainly due to a lack of parametric methods specially designed to take the intrinsic heterogeneity of eukaryote genomes into account. We applied a simple and tested method based on local variations of genomic signatures to analyze the genome of the pathogenic fungus Aspergillus fumigatus. RESULTS: We detected 189 atypical regions containing 214 genes, accounting for about 1 Mb of DNA sequences. However, the fraction of atypical DNA detected was smaller than the average amount detected in the same conditions in prokaryote genomes (3.1% vs 5.6%). It appeared that about one third of these regions contained no annotated genes, a proportion far greater than in prokaryote genomes. When analyzing the origin of these HTs by comparing their signatures to a home made database of species signatures, 3 groups of donor species emerged: bacteria (40%), fungi (25%), and viruses (22%). It is to be noticed that though inter-domain exchanges are confirmed, we only put in evidence very few exchanges between eukaryotic kingdoms. CONCLUSIONS: In conclusion, we demonstrated that HTs are not negligible in eukaryote genomes, bearing in mind that in our stringent conditions this amount is a floor value, though of a lesser extent than in prokaryote genomes. The biological mechanisms underlying those transfers remain to be elucidated as well as the biological functions of the transferred genes.
- Havrylenko S, Legouis R, Negrutskii B, Mirande M
- Methionyl-tRNA synthetase from Caenorhabditis elegans: a specific multidomain organization for convergent functional evolution.
- Protein Sci. 2010; 19: 2475-84
- Display abstract
Methionyl-tRNA synthetase (MetRS) is a multidomain protein that specifically binds tRNAMet and catalyzes the synthesis of methionyl-tRNAMet. The minimal, core enzyme found in Aquifex aeolicus is made of a catalytic domain, which catalyzes the aminoacylation reaction, and an anticodon-binding domain, which promotes tRNA-protein association. In eukaryotes, additional domains are appended in cis or in trans to the core enzyme and increase the stability of the tRNA-protein complexes. Eventually, as observed for MetRS from Homo sapiens, the C-terminal appended domain causes a slow release of aminoacyl-tRNA and establishes a limiting step in the global aminoacylation reaction. Here, we report that MetRS from the nematode Caenorhabditis elegans displays a new type of structural organization. Its very C-terminal appended domain is related to the oligonucleotide binding-fold-based tRNA-binding domain (tRBD) recovered at the C-terminus of MetRS from plant, but, in the nematode enzyme, this domain is separated from the core enzyme by an insertion domain. Gel retardation and tRNA aminoacylation experiments show that MetRS from nematode is functionally related to human MetRS despite the fact that their appended tRBDs have distinct structural folds, and are not orthologs. Thus, functional convergence of human and nematode MetRS is the result of parallel and convergent evolution that might have been triggered by the selective pressure to invent processivity of tRNA handling in translation in higher eukaryotes.
- Chia N, Cann I, Olsen GJ
- Evolution of DNA replication protein complexes in eukaryotes and Archaea.
- PLoS One. 2010; 5: 10866-10866
- Display abstract
BACKGROUND: The replication of DNA in Archaea and eukaryotes requires several ancillary complexes, including proliferating cell nuclear antigen (PCNA), replication factor C (RFC), and the minichromosome maintenance (MCM) complex. Bacterial DNA replication utilizes comparable proteins, but these are distantly related phylogenetically to their archaeal and eukaryotic counterparts at best. METHODOLOGY/PRINCIPAL FINDINGS: While the structures of each of the complexes do not differ significantly between the archaeal and eukaryotic versions thereof, the evolutionary dynamic in the two cases does. The number of subunits in each complex is constant across all taxa. However, they vary subtly with regard to composition. In some taxa the subunits are all identical in sequence, while in others some are homologous rather than identical. In the case of eukaryotes, there is no phylogenetic variation in the makeup of each complex-all appear to derive from a common eukaryotic ancestor. This is not the case in Archaea, where the relationship between the subunits within each complex varies taxon-to-taxon. We have performed a detailed phylogenetic analysis of these relationships in order to better understand the gene duplications and divergences that gave rise to the homologous subunits in Archaea. CONCLUSION/SIGNIFICANCE: This domain level difference in evolution suggests that different forces have driven the evolution of DNA replication proteins in each of these two domains. In addition, the phylogenies of all three gene families support the distinctiveness of the proposed archaeal phylum Thaumarchaeota.
- Elias M
- Patterns and processes in the evolution of the eukaryotic endomembrane system.
- Mol Membr Biol. 2010; 27: 469-89
- Display abstract
The eukaryotic endomembrane system (ES) is served by hundreds of dedicated proteins. Experimental characterization of the ES-associated molecular machinery in several model eukaryotes complemented by a recent progress in phylogenomics and comparative genomics have revealed a conserved complex core of the machinery that appears to have been established before the last eukaryotic common ancestor (LECA). At the same time, modern eukaryotes exhibit a huge variation in the ES resulting from a multitude of evolutionary processes operating along the ever-branching paths from the LECA to its descendants. The most important source of evolutionary novelty in the ES functioning has undoubtedly been gene duplication followed by divergence of the gene copies, responsible not only for the pre-LECA establishment of many multi-paralog families of proteins in the very core of the ES-associated machinery, but also for post-LECA lineage-specific elaborations via family expansions and the origin of novel components. Extreme sequence divergence has obscured actual homologous relationships between potentially many components of the machinery, even between orthologous proteins, as illustrated by the yeast Vps51 subunit of the vesicle tethering complex GARP hypothesized here to be a highly modified ortholog of a conserved eukaryotic family typified by the zebrafish Fat-free (Ffr) protein. A dynamic evolution of many ES-associated proteins, especially those centred around RAB and ARF GTPases, seems to take place at the level of their domain architectures. Finally, reductive evolution and recurrent gene loss are emerging as pervasive factors shaping the ES in all phylogenetic lineages.
- Klassen JL
- Phylogenetic and evolutionary patterns in microbial carotenoid biosynthesis are revealed by comparative genomics.
- PLoS One. 2010; 5: 11257-11257
- Display abstract
BACKGROUND: Carotenoids are multifunctional, taxonomically widespread and biotechnologically important pigments. Their biosynthesis serves as a model system for understanding the evolution of secondary metabolism. Microbial carotenoid diversity and evolution has hitherto been analyzed primarily from structural and biosynthetic perspectives, with the few phylogenetic analyses of microbial carotenoid biosynthetic proteins using either used limited datasets or lacking methodological rigor. Given the recent accumulation of microbial genome sequences, a reappraisal of microbial carotenoid biosynthetic diversity and evolution from the perspective of comparative genomics is warranted to validate and complement models of microbial carotenoid diversity and evolution based upon structural and biosynthetic data. METHODOLOGY/PRINCIPAL FINDINGS: Comparative genomics were used to identify and analyze in silico microbial carotenoid biosynthetic pathways. Four major phylogenetic lineages of carotenoid biosynthesis are suggested composed of: (i) Proteobacteria; (ii) Firmicutes; (iii) Chlorobi, Cyanobacteria and photosynthetic eukaryotes; and (iv) Archaea, Bacteroidetes and two separate sub-lineages of Actinobacteria. Using this phylogenetic framework, specific evolutionary mechanisms are proposed for carotenoid desaturase CrtI-family enzymes and carotenoid cyclases. Several phylogenetic lineage-specific evolutionary mechanisms are also suggested, including: (i) horizontal gene transfer; (ii) gene acquisition followed by differential gene loss; (iii) co-evolution with other biochemical structures such as proteorhodopsins; and (iv) positive selection. CONCLUSIONS/SIGNIFICANCE: Comparative genomics analyses of microbial carotenoid biosynthetic proteins indicate a much greater taxonomic diversity then that identified based on structural and biosynthetic data, and divides microbial carotenoid biosynthesis into several, well-supported phylogenetic lineages not evident previously. This phylogenetic framework is applicable to understanding the evolution of specific carotenoid biosynthetic proteins or the unique characteristics of carotenoid biosynthetic evolution in a specific phylogenetic lineage. Together, these analyses suggest a "bramble" model for microbial carotenoid biosynthesis whereby later biosynthetic steps exhibit greater evolutionary plasticity and reticulation compared to those closer to the biosynthetic "root". Structural diversification may be constrained ("trimmed") where selection is strong, but less so where selection is weaker. These analyses also highlight likely productive avenues for future research and bioprospecting by identifying both gaps in current knowledge and taxa which may particularly facilitate carotenoid diversification.
- Lundin D, Gribaldo S, Torrents E, Sjoberg BM, Poole AM
- Ribonucleotide reduction - horizontal transfer of a required function spans all three domains.
- BMC Evol Biol. 2010; 10: 383-383
- Display abstract
BACKGROUND: Ribonucleotide reduction is the only de novo pathway for synthesis of deoxyribonucleotides, the building blocks of DNA. The reaction is catalysed by ribonucleotide reductases (RNRs), an ancient enzyme family comprised of three classes. Each class has distinct operational constraints, and are broadly distributed across organisms from all three domains, though few class I RNRs have been identified in archaeal genomes, and classes II and III likewise appear rare across eukaryotes. In this study, we examine whether this distribution is best explained by presence of all three classes in the Last Universal Common Ancestor (LUCA), or by horizontal gene transfer (HGT) of RNR genes. We also examine to what extent environmental factors may have impacted the distribution of RNR classes. RESULTS: Our phylogenies show that the Last Eukaryotic Common Ancestor (LECA) possessed a class I RNR, but that the eukaryotic class I enzymes are not directly descended from class I RNRs in Archaea. Instead, our results indicate that archaeal class I RNR genes have been independently transferred from bacteria on two occasions. While LECA possessed a class I RNR, our trees indicate that this is ultimately bacterial in origin. We also find convincing evidence that eukaryotic class I RNR has been transferred to the Bacteroidetes, providing a stunning example of HGT from eukaryotes back to Bacteria. Based on our phylogenies and available genetic and genomic evidence, class II and III RNRs in eukaryotes also appear to have been transferred from Bacteria, with subsequent within-domain transfer between distantly-related eukaryotes. Under the three-domains hypothesis the RNR present in the last common ancestor of Archaea and eukaryotes appears, through a process of elimination, to have been a dimeric class II RNR, though limited sampling of eukaryotes precludes a firm conclusion as the data may be equally well accounted for by HGT. CONCLUSIONS: Horizontal gene transfer has clearly played an important role in the evolution of the RNR repertoire of organisms from all three domains of life. Our results clearly show that class I RNRs have spread to Archaea and eukaryotes via transfers from the bacterial domain, indicating that class I likely evolved in the Bacteria. However, against the backdrop of ongoing transfers, it is harder to establish whether class II or III RNRs were present in the LUCA, despite the fact that ribonucleotide reduction is an essential cellular reaction and was pivotal to the transition from RNA to DNA genomes. Instead, a general pattern of ongoing horizontal transmission emerges wherein environmental and enzyme operational constraints, especially the presence or absence of oxygen, are likely to be major determinants of the RNR repertoire of genomes.
- Perrin E et al.
- Exploring the HME and HAE1 efflux systems in the genus Burkholderia.
- BMC Evol Biol. 2010; 10: 164-164
- Display abstract
BACKGROUND: The genus Burkholderia includes a variety of species with opportunistic human pathogenic strains, whose increasing global resistance to antibiotics has become a public health problem. In this context a major role could be played by multidrug efflux pumps belonging to Resistance Nodulation Cell-Division (RND) family, which allow bacterial cells to extrude a wide range of different substrates, including antibiotics. This study aims to i) identify rnd genes in the 21 available completely sequenced Burkholderia genomes, ii) analyze their phylogenetic distribution, iii) define the putative function(s) that RND proteins perform within the Burkholderia genus and iv) try tracing the evolutionary history of some of these genes in Burkholderia. RESULTS: BLAST analysis of the 21 Burkholderia sequenced genomes, using experimentally characterized ceoB sequence (one of the RND family counterpart in the genus Burkholderia) as probe, allowed the assembly of a dataset comprising 254 putative RND proteins. An extensive phylogenetic analysis revealed the occurrence of several independent events of gene loss and duplication across the different lineages of the genus Burkholderia, leading to notable differences in the number of paralogs between different genomes. A putative substrate [antibiotics (HAE1 proteins)/heavy-metal (HME proteins)] was also assigned to the majority of these proteins. No correlation was found between the ecological niche and the lifestyle of Burkholderia strains and the number/type of efflux pumps they possessed, while a relation can be found with genome size and taxonomy. Remarkably, we observed that only HAE1 proteins are mainly responsible for the different number of proteins observed in strains of the same species. Data concerning both the distribution and the phylogenetic analysis of the HAE1 and HME in the Burkholderia genus allowed depicting a likely evolutionary model accounting for the evolution and spreading of HME and HAE1 systems in the Burkholderia genus. CONCLUSION: A complete knowledge of the presence and distribution of RND proteins in Burkholderia species was obtained and an evolutionary model was depicted. Data presented in this work may serve as a basis for future experimental tests, focused especially on HAE1 proteins, aimed at the identification of novel targets in antimicrobial therapy against Burkholderia species.
- Yutin N, Wolf MY, Wolf YI, Koonin EV
- The origins of phagocytosis and eukaryogenesis.
- Biol Direct. 2009; 4: 9-9
- Display abstract
BACKGROUND: Phagocytosis, that is, engulfment of large particles by eukaryotic cells, is found in diverse organisms and is often thought to be central to the very origin of the eukaryotic cell, in particular, for the acquisition of bacterial endosymbionts including the ancestor of the mitochondrion. RESULTS: Comparisons of the sets of proteins implicated in phagocytosis in different eukaryotes reveal extreme diversity, with very few highly conserved components that typically do not possess readily identifiable prokaryotic homologs. Nevertheless, phylogenetic analysis of those proteins for which such homologs do exist yields clues to the possible origin of phagocytosis. The central finding is that a subset of archaea encode actins that are not only monophyletic with eukaryotic actins but also share unique structural features with actin-related proteins (Arp) 2 and 3. All phagocytic processes are strictly dependent on remodeling of the actin cytoskeleton and the formation of branched filaments for which Arp2/3 are responsible. The presence of common structural features in Arp2/3 and the archaeal actins suggests that the common ancestors of the archaeal and eukaryotic actins were capable of forming branched filaments, like modern Arp2/3. The Rho family GTPases that are ubiquitous regulators of phagocytosis in eukaryotes appear to be of bacterial origin, so assuming that the host of the mitochondrial endosymbiont was an archaeon, the genes for these GTPases come via horizontal gene transfer from the endosymbiont or in an earlier event. CONCLUSION: The present findings suggest a hypothetical scenario of eukaryogenesis under which the archaeal ancestor of eukaryotes had no cell wall (like modern Thermoplasma) but had an actin-based cytoskeleton including branched actin filaments that allowed this organism to produce actin-supported membrane protrusions. These protrusions would facilitate accidental, occasional engulfment of bacteria, one of which eventually became the mitochondrion. The acquisition of the endosymbiont triggered eukaryogenesis, in particular, the emergence of the endomembrane system that eventually led to the evolution of modern-type phagocytosis, independently in several eukaryotic lineages.
- Lewis AL et al.
- Innovations in host and microbial sialic acid biosynthesis revealed by phylogenomic prediction of nonulosonic acid structure.
- Proc Natl Acad Sci U S A. 2009; 106: 13552-7
- Display abstract
Sialic acids (Sias) are nonulosonic acid (NulO) sugars prominently displayed on vertebrate cells and occasionally mimicked by bacterial pathogens using homologous biosynthetic pathways. It has been suggested that Sias were an animal innovation and later emerged in pathogens by convergent evolution or horizontal gene transfer. To better illuminate the evolutionary processes underlying the phenomenon of Sia molecular mimicry, we performed phylogenomic analyses of biosynthetic pathways for Sias and related higher sugars derived from 5,7-diamino-3,5,7,9-tetradeoxynon-2-ulosonic acids. Examination of approximately 1,000 sequenced microbial genomes indicated that such biosynthetic pathways are far more widely distributed than previously realized. Phylogenetic analysis, validated by targeted biochemistry, was used to predict NulO types (i.e., neuraminic, legionaminic, or pseudaminic acids) expressed by various organisms. This approach uncovered previously unreported occurrences of Sia pathways in pathogenic and symbiotic bacteria and identified at least one instance in which a human archaeal symbiont tentatively reported to express Sias in fact expressed the related pseudaminic acid structure. Evaluation of targeted phylogenies and protein domain organization revealed that the "unique" Sia biosynthetic pathway of animals was instead a much more ancient innovation. Pathway phylogenies suggest that bacterial pathogens may have acquired Sia expression via adaptation of pathways for legionaminic acid biosynthesis, one of at least 3 evolutionary paths for de novo Sia synthesis. Together, these data indicate that some of the long-standing paradigms in Sia biology should be reconsidered in a wider evolutionary context of the extended family of NulO sugars.
- Sorrels CM, Proteau PJ, Gerwick WH
- Organization, evolution, and expression analysis of the biosynthetic gene cluster for scytonemin, a cyanobacterial UV-absorbing pigment.
- Appl Environ Microbiol. 2009; 75: 4861-9
- Display abstract
Cyanobacteria are photosynthetic prokaryotes capable of protecting themselves from UV radiation through the biosynthesis of UV-absorbing secondary metabolites, such as the mycosporines and scytonemin. Scytonemin, a novel indolic-phenolic pigment, is found sequestered in the sheath, where it provides protection to the subtending cells during exposure to UV radiation. The biosynthesis of scytonemin is encoded by a previously identified gene cluster that is present in six cyanobacterial species whose genomes are available. A comparison of these clusters reveals that two major cluster architectures exist which appear to have evolved through rearrangements of large sections, such as those genes responsible for aromatic amino acid biosynthesis and through the insertion of genes that potentially confer additional biosynthetic capabilities. Differential transcriptional expression analysis demonstrated that the entire gene cluster is transcribed in higher abundance after exposure to UV radiation. This analysis helps delineate the cluster boundaries and indicates that regulation of this cluster is controlled by the presence or absence of UV radiation. The findings from an evolutionary phylogenetic analysis combined with the fact that the scytonemin gene cluster is distributed across several cyanobacterial lineages led to our proposal that the distribution of this gene cluster is best explained through an ancient evolutionary origin.
- Makarova KS, Wolf YI, van der Oost J, Koonin EV
- Prokaryotic homologs of Argonaute proteins are predicted to function as key components of a novel system of defense against mobile genetic elements.
- Biol Direct. 2009; 4: 29-29
- Display abstract
BACKGROUND: In eukaryotes, RNA interference (RNAi) is a major mechanism of defense against viruses and transposable elements as well of regulating translation of endogenous mRNAs. The RNAi systems recognize the target RNA molecules via small guide RNAs that are completely or partially complementary to a region of the target. Key components of the RNAi systems are proteins of the Argonaute-PIWI family some of which function as slicers, the nucleases that cleave the target RNA that is base-paired to a guide RNA. Numerous prokaryotes possess the CRISPR-associated system (CASS) of defense against phages and plasmids that is, in part, mechanistically analogous but not homologous to eukaryotic RNAi systems. Many prokaryotes also encode homologs of Argonaute-PIWI proteins but their functions remain unknown. RESULTS: We present a detailed analysis of Argonaute-PIWI protein sequences and the genomic neighborhoods of the respective genes in prokaryotes. Whereas eukaryotic Ago/PIWI proteins always contain PAZ (oligonucleotide binding) and PIWI (active or inactivated nuclease) domains, the prokaryotic Argonaute homologs (pAgos) fall into two major groups in which the PAZ domain is either present or absent. The monophyly of each group is supported by a phylogenetic analysis of the conserved PIWI-domains. Almost all pAgos that lack a PAZ domain appear to be inactivated, and the respective genes are associated with a variety of predicted nucleases in putative operons. An additional, uncharacterized domain that is fused to various nucleases appears to be a unique signature of operons encoding the short (lacking PAZ) pAgo form. By contrast, almost all PAZ-domain containing pAgos are predicted to be active nucleases. Some proteins of this group (e.g., that from Aquifex aeolicus) have been experimentally shown to possess nuclease activity, and are not typically associated with genes for other (putative) nucleases. Given these observations, the apparent extensive horizontal transfer of pAgo genes, and their common, statistically significant over-representation in genomic neighborhoods enriched in genes encoding proteins involved in the defense against phages and/or plasmids, we hypothesize that pAgos are key components of a novel class of defense systems. The PAZ-domain containing pAgos are predicted to directly destroy virus or plasmid nucleic acids via their nuclease activity, whereas the apparently inactivated, PAZ-lacking pAgos could be structural subunits of protein complexes that contain, as active moieties, the putative nucleases that we predict to be co-expressed with these pAgos. All these nucleases are predicted to be DNA endonucleases, so it seems most probable that the putative novel phage/plasmid-defense system targets phage DNA rather than mRNAs. Given that in eukaryotic RNAi systems, the PAZ domain binds a guide RNA and positions it on the complementary region of the target, we further speculate that pAgos function on a similar principle (the guide being either DNA or RNA), and that the uncharacterized domain found in putative operons with the short forms of pAgos is a functional substitute for the PAZ domain. CONCLUSION: The hypothesis that pAgos are key components of a novel prokaryotic immune system that employs guide RNA or DNA molecules to degrade nucleic acids of invading mobile elements implies a functional analogy with the prokaryotic CASS and a direct evolutionary connection with eukaryotic RNAi. The predictions of the hypothesis including both the activities of pAgos and those of the associated endonucleases are readily amenable to experimental tests.
- Yutin N, Koonin EV
- Evolution of DNA ligases of nucleo-cytoplasmic large DNA viruses of eukaryotes: a case of hidden complexity.
- Biol Direct. 2009; 4: 51-51
- Display abstract
BACKGROUND: Eukaryotic Nucleo-Cytoplasmic Large DNA Viruses (NCLDV) encode most if not all of the enzymes involved in their DNA replication. It has been inferred that genes for these enzymes were already present in the last common ancestor of the NCLDV. However, the details of the evolution of these genes that bear on the complexity of the putative ancestral NCLDV and on the evolutionary relationships between viruses and their hosts are not well understood. RESULTS: Phylogenetic analysis of the ATP-dependent and NAD-dependent DNA ligases encoded by the NCLDV reveals an unexpectedly complex evolutionary history. The NAD-dependent ligases are encoded only by a minority of NCLDV (including mimiviruses, some iridoviruses and entomopoxviruses) but phylogenetic analysis clearly indicated that all viral NAD-dependent ligases are monophyletic. Combined with the topology of the NCLDV tree derived by consensus of trees for universally conserved genes suggests that this enzyme was represented in the ancestral NCLDV. Phylogenetic analysis of ATP-dependent ligases that are encoded by chordopoxviruses, most of the phycodnaviruses and Marseillevirus failed to demonstrate monophyly and instead revealed an unexpectedly complex evolutionary trajectory. The ligases of the majority of phycodnaviruses and Marseillevirus seem to have evolved from bacteriophage or bacterial homologs; the ligase of one phycodnavirus, Emiliana huxlei virus, belongs to the eukaryotic DNA ligase I branch; and ligases of chordopoxviruses unequivocally cluster with eukaryotic DNA ligase III. CONCLUSIONS: Examination of phyletic patterns and phylogenetic analysis of DNA ligases of the NCLDV suggest that the common ancestor of the extant NCLDV encoded an NAD-dependent ligase that most likely was acquired from a bacteriophage at the early stages of evolution of eukaryotes. By contrast, ATP-dependent ligases from different prokaryotic and eukaryotic sources displaced the ancestral NAD-dependent ligase at different stages of subsequent evolution. These findings emphasize complex routes of viral evolution that become apparent through detailed phylogenomic analysis but not necessarily in reconstructions based on phyletic patterns of genes. REVIEWERS: This article was reviewed by: Patrick Forterre, George V. Shpakovski, and Igor B. Zhulin.
- Wout PK, Sattlegger E, Sullivan SM, Maddock JR
- Saccharomyces cerevisiae Rbg1 protein and its binding partner Gir2 interact on Polyribosomes with Gcn1.
- Eukaryot Cell. 2009; 8: 1061-71
- Display abstract
Rbg1 is a previously uncharacterized protein of Saccharomyces cerevisiae belonging to the Obg/CgtA subfamily of GTP-binding proteins whose members are involved in ribosome function in both prokaryotes and eukaryotes. We show here that Rbg1 specifically associates with translating ribosomes. In addition, in this study proteins were identified that interact with Rbg1 by yeast two-hybrid screening and include Tma46, Ygr250c, Yap1, and Gir2. Gir2 contains a GI (Gcn2 and Impact) domain similar to that of Gcn2, an essential factor of the general amino acid control pathway required for overcoming amino acid shortage. Interestingly, we found that Gir2, like Gcn2, interacts with Gcn1 through its GI domain, and overexpression of Gir2, under conditions mimicking amino acid starvation, resulted in inhibition of growth that could be reversed by Gcn2 co-overexpression. Moreover, we found that Gir2 also cofractionated with polyribosomes, and this fractionation pattern was partially dependent on the presence of Gcn1. Based on these findings, we conclude that Rbg1 and its interacting partner Gir2 associate with ribosomes, and their possible biological roles are discussed.
- Frenkel-Morgenstern M, Tworowski D, Klipcan L, Safro M
- Intra-protein compensatory mutations analysis highlights the tRNA recognition regions in aminoacyl-tRNA synthetases.
- J Biomol Struct Dyn. 2009; 27: 115-26
- Display abstract
The aminoacyl-tRNA synthetases (aaRSs) covalently attach amino acids to their corresponding nucleic acid adapter molecules, tRNAs. The interactions in the tRNA-aaRSs complexes are mostly non-specific, and largely electrostatic. Tracing a way of aaRS-tRNA mutual adaptation throughout evolution offers a clearer view of understanding how aaRS-tRNA systems preserve patterns of tRNA recognition and binding. In this study, we used the compensatory mutations analysis to explore adaptation of aaRSs in respond to random mutations that can occur in the tRNA-recognition area. We showed that the frequency of compensatory mutations among residues that belong to the recognition region is 1.75-fold higher than that of the exposed residues. The highest frequencies of compensatory mutations are observed for pairs of charged residues, wherein one residue is located within the tRNA-recognition area, while the second is placed outside of the area, and contributes to the formation of the aaRS electrostatic landscape. Given charged residues are compensated by buried charge residues in more than 60% of the analyzed mutations. The cytoplasmatic and mitochondrial aaRSs preserve similar patterns of compensatory mutations in the tRNA recognition areas. Moreover, we found that mitochondrial aaRSs demonstrate a significant increase in the frequency of compensatory mutations in the area. Our findings shed light on the physical nature of compensatory mutations in aaRSs, thereby keeping unchanged tRNA-recognition patterns.
- Wydau S, van der Rest G, Aubard C, Plateau P, Blanquet S
- Widespread distribution of cell defense against D-aminoacyl-tRNAs.
- J Biol Chem. 2009; 284: 14096-104
- Display abstract
Several l-aminoacyl-tRNA synthetases can transfer a d-amino acid onto their cognate tRNA(s). This harmful reaction is counteracted by the enzyme d-aminoacyl-tRNA deacylase. Two distinct deacylases were already identified in bacteria (DTD1) and in archaea (DTD2), respectively. Evidence was given that DTD1 homologs also exist in nearly all eukaryotes, whereas DTD2 homologs occur in plants. On the other hand, several bacteria, including most cyanobacteria, lack genes encoding a DTD1 homolog. Here we show that Synechocystis sp. PCC6803 produces a third type of deacylase (DTD3). Inactivation of the corresponding gene (dtd3) renders the growth of Synechocystis sp. hypersensitive to the presence of d-tyrosine. Based on the available genomes, DTD3-like proteins are predicted to occur in all cyanobacteria. Moreover, one or several dtd3-like genes can be recognized in all cellular types, arguing in favor of the nearubiquity of an enzymatic function involved in the defense of translational systems against invasion by d-amino acids.
- Chang KM, Hendrickson TL
- Recognition of tRNAGln by Helicobacter pylori GluRS2--a tRNAGln-specific glutamyl-tRNA synthetase.
- Nucleic Acids Res. 2009; 37: 6942-9
- Display abstract
Accurate aminoacylation of tRNAs by the aminoacyl-tRNA synthetases (aaRSs) plays a critical role in protein translation. However, some of the aaRSs are missing in many microorganisms. Helicobacter pylori does not have a glutaminyl-tRNA synthetase (GlnRS) but has two divergent glutamyl-tRNA synthetases: GluRS1 and GluRS2. Like a canonical GluRS, GluRS1 aminoacylates tRNA(Glu1) and tRNA(Glu2). In contrast, GluRS2 only misacylates tRNA(Gln) to form Glu-tRNA(Gln). It is not clear how GluRS2 achieves specific recognition of tRNA(Gln) while rejecting the two H. pylori tRNA(Glu) isoacceptors. Here, we show that GluRS2 recognizes major identity elements clustered in the tRNA(Gln) acceptor stem. Mutations in the tRNA anticodon or at the discriminator base had little to no impact on enzyme specificity and activity.
- Choi K, Gomez SM
- Comparison of phylogenetic trees through alignment of embedded evolutionary distances.
- BMC Bioinformatics. 2009; 10: 423-423
- Display abstract
BACKGROUND: The understanding of evolutionary relationships is a fundamental aspect of modern biology, with the phylogenetic tree being a primary tool for describing these associations. However, comparison of trees for the purpose of assessing similarity and the quantification of various biological processes remains a significant challenge. RESULTS: We describe a novel approach for the comparison of phylogenetic distance information based on the alignment of representative high-dimensional embeddings (xCEED: Comparison of Embedded Evolutionary Distances). The xCEED methodology, which utilizes multidimensional scaling and Procrustes-related superimposition approaches, provides the ability to measure the global similarity between trees as well as incongruities between them. We demonstrate the application of this approach to the prediction of coevolving protein interactions and demonstrate its improved performance over the mirrortree, tol-mirrortree, phylogenetic vector projection, and partial correlation approaches. Furthermore, we show its applicability to both the detection of horizontal gene transfer events as well as its potential use in the prediction of interaction specificity between a pair of multigene families. CONCLUSIONS: These approaches provide additional tools for the study of phylogenetic trees and associated evolutionary processes. Source code is available at http://gomezlab.bme.unc.edu/tools.
- Zhang XC, Cannon SB, Stacey G
- Evolutionary genomics of LysM genes in land plants.
- BMC Evol Biol. 2009; 9: 183-183
- Display abstract
BACKGROUND: The ubiquitous LysM motif recognizes peptidoglycan, chitooligosaccharides (chitin) and, presumably, other structurally-related oligosaccharides. LysM-containing proteins were first shown to be involved in bacterial cell wall degradation and, more recently, were implicated in perceiving chitin (one of the established pathogen-associated molecular patterns) and lipo-chitin (nodulation factors) in flowering plants. However, the majority of LysM genes in plants remain functionally uncharacterized and the evolutionary history of complex LysM genes remains elusive. RESULTS: We show that LysM-containing proteins display a wide range of complex domain architectures. However, only a simple core architecture is conserved across kingdoms. Each individual kingdom appears to have evolved a distinct array of domain architectures. We show that early plant lineages acquired four characteristic architectures and progressively lost several primitive architectures. We report plant LysM phylogenies and associated gene, protein and genomic features, and infer the relative timing of duplications of LYK genes. CONCLUSION: We report a domain architecture catalogue of LysM proteins across all kingdoms. The unique pattern of LysM protein domain architectures indicates the presence of distinctive evolutionary paths in individual kingdoms. We describe a comparative and evolutionary genomics study of LysM genes in plant kingdom. One of the two groups of tandemly arrayed plant LYK genes likely resulted from an ancient genome duplication followed by local genomic rearrangement, while the origin of the other groups of tandemly arrayed LYK genes remains obscure. Given the fact that no animal LysM motif-containing genes have been functionally characterized, this study provides clues to functional characterization of plant LysM genes and is also informative with regard to evolutionary and functional studies of animal LysM genes.
- McCrow JP
- Alignment of phylogenetically unambiguous indels in Shewanella.
- J Comput Biol. 2009; 16: 1517-28
- Display abstract
High levels of alignment errors associated with gaps have generally meant their exclusion from phylogenetic analysis. Conserved inserts and deletions (indels) may in some cases be less subject to errors than amino acid substitutions for inferring the history of genomes and identifying recently laterally transferred genes, but alignment error near gaps must be evaluated prior to using indels as phylogenetic characters. A method is presented for evaluating the phylogenetic unambiguity of gaps in multiple sequence alignments by allowing a defined amount of pairwise alignment ambiguity. This work considers the bacterial genus Shewanella, which is of particular interest for applications of bioremediation and environmental engineering. Understanding the genetic history of these species is vital for these applications. A set of pairwise dynamic programming alignments is constructed to test positions in multiple alignments for phylogenetic unambiguity, and a whole genome scan is done on protein sequences from 11 sequenced species of the bacterial genus Shewanella. The splits defined by phylogenetically unambiguous indels are then used as characters for phylogenetic analysis, and results are compared to whole genome Maximum Likelihood phylogeny. A comparable description of the history of the species is found, as well as a set of lateral gene transfer candidates undetectable by traditional analysis of amino acid substitutions. This analysis is applicable to other taxonomic units at all levels and has the potential to allow cataloging of clear genome-wide phylogenetic markers for taxonomic profiling down to the species level.
- Morris PF, Schlosser LR, Onasch KD, Wittenschlaeger T, Austin R, Provart N
- Multiple horizontal gene transfer events and domain fusions have created novel regulatory and metabolic networks in the oomycete genome.
- PLoS One. 2009; 4: 6133-6133
- Display abstract
Complex enzymes with multiple catalytic activities are hypothesized to have evolved from more primitive precursors. Global analysis of the Phytophthora sojae genome using conservative criteria for evaluation of complex proteins identified 273 novel multifunctional proteins that were also conserved in P. ramorum. Each of these proteins contains combinations of protein motifs that are not present in bacterial, plant, animal, or fungal genomes. A subset of these proteins were also identified in the two diatom genomes, but the majority of these proteins have formed after the split between diatoms and oomycetes. Documentation of multiple cases of domain fusions that are common to both oomycetes and diatom genomes lends additional support for the hypothesis that oomycetes and diatoms are monophyletic. Bifunctional proteins that catalyze two steps in a metabolic pathway can be used to infer the interaction of orthologous proteins that exist as separate entities in other genomes. We postulated that the novel multifunctional proteins of oomycetes could function as potential Rosetta Stones to identify interacting proteins of conserved metabolic and regulatory networks in other eukaryotic genomes. However ortholog analysis of each domain within our set of 273 multifunctional proteins against 39 sequenced bacterial and eukaryotic genomes, identified only 18 candidate Rosetta Stone proteins. Thus the majority of multifunctional proteins are not Rosetta Stones, but they may nonetheless be useful in identifying novel metabolic and regulatory networks in oomycetes. Phylogenetic analysis of all the enzymes in three pathways with one or more novel multifunctional proteins was conducted to determine the probable origins of individual enzymes. These analyses revealed multiple examples of horizontal transfer from both bacterial genomes and the photosynthetic endosymbiont in the ancestral genome of Stramenopiles. The complexity of the phylogenetic origins of these metabolic pathways and the paucity of Rosetta Stones relative to the total number of multifunctional proteins suggests that the proteome of oomycetes has few features in common with other Kingdoms.
- Glazer AN, Kechris KJ
- Conserved amino acid sequence features in the alpha subunits of MoFe, VFe, and FeFe nitrogenases.
- PLoS One. 2009; 4: 6136-6136
- Display abstract
BACKGROUND: This study examines the structural features and phylogeny of the alpha subunits of 69 full-length NifD (MoFe subunit), VnfD (VFe subunit), and AnfD (FeFe subunit) sequences. METHODOLOGY/PRINCIPAL FINDINGS: The analyses of this set of sequences included BLAST scores, multiple sequence alignment, examination of patterns of covariant residues, phylogenetic analysis and comparison of the sequences flanking the conserved Cys and His residues that attach the FeMo cofactor to NifD and that are also conserved in the alternative nitrogenases. The results show that NifD nitrogenases fall into two distinct groups. Group I includes NifD sequences from many genera within Bacteria, including all nitrogen-fixing aerobes examined, as well as strict anaerobes and some facultative anaerobes, but no archaeal sequences. In contrast, Group II NifD sequences were limited to a small number of archaeal and bacterial sequences from strict anaerobes. The VnfD and AnfD sequences fall into two separate groups, more closely related to Group II NifD than to Group I NifD. The pattern of perfectly conserved residues, distributed along the full length of the Group I and II NifD, VnfD, and AnfD, confirms unambiguously that these polypeptides are derived from a common ancestral sequence. CONCLUSIONS/SIGNIFICANCE: There is no indication of a relationship between the patterns of covariant residues specific to each of the four groups discussed above that would give indications of an evolutionary pathway leading from one type of nitrogenase to another. Rather the totality of the data, along with the phylogenetic analysis, is consistent with a radiation of Group I and II NifDs, VnfD and AnfD from a common ancestral sequence. All the data presented here strongly support the suggestion made by some earlier investigators that the nitrogenase family had already evolved in the last common ancestor of the Archaea and Bacteria.
- Bonner CA et al.
- Cohesion group approach for evolutionary analysis of TyrA, a protein family with wide-ranging substrate specificities.
- Microbiol Mol Biol Rev. 2008; 72: 13-53
- Display abstract
Many enzymes and other proteins are difficult subjects for bioinformatic analysis because they exhibit variant catalytic, structural, regulatory, and fusion mode features within a protein family whose sequences are not highly conserved. However, such features reflect dynamic and interesting scenarios of evolutionary importance. The value of experimental data obtained from individual organisms is instantly magnified to the extent that given features of the experimental organism can be projected upon related organisms. But how can one decide how far along the similarity scale it is reasonable to go before such inferences become doubtful? How can a credible picture of evolutionary events be deduced within the vertical trace of inheritance in combination with intervening events of lateral gene transfer (LGT)? We present a comprehensive analysis of a dehydrogenase protein family (TyrA) as a prototype example of how these goals can be accomplished through the use of cohesion group analysis. With this approach, the full collection of homologs is sorted into groups by a method that eliminates bias caused by an uneven representation of sequences from organisms whose phylogenetic spacing is not optimal. Each sufficiently populated cohesion group is phylogenetically coherent and defined by an overall congruence with a distinct section of the 16S rRNA gene tree. Exceptions that occasionally are found implicate a clearly defined LGT scenario whereby the recipient lineage is apparent and the donor lineage of the gene transferred is localized to those organisms that define the cohesion group. Systematic procedures to manage and organize otherwise overwhelming amounts of data are demonstrated.
- Intra J, Pavesi G, Horner DS
- Phylogenetic analyses suggest multiple changes of substrate specificity within the glycosyl hydrolase 20 family.
- BMC Evol Biol. 2008; 8: 214-214
- Display abstract
BACKGROUND: Beta-N-acetylhexosaminidases belonging to the glycosyl hydrolase 20 (GH20) family are involved in the removal of terminal beta-glycosidacally linked N-acetylhexosamine residues. These enzymes, widely distributed in microorganisms, animals and plants, are involved in many important physiological and pathological processes, such as cell structural integrity, energy storage, pathogen defence, viral penetration, cellular signalling, fertilization, development of carcinomas, inflammatory events and lysosomal storage diseases. Nevertheless, only limited analyses of phylogenetic relationships between GH20 genes have been performed until now. RESULTS: Careful phylogenetic analyses of 233 inferred protein sequences from eukaryotes and prokaryotes reveal a complex history for the GH20 family. In bacteria, multiple gene duplications and lineage specific gene loss (and/or horizontal gene transfer) are required to explain the observed taxonomic distribution. The last common ancestor of extant eukaryotes is likely to have possessed at least one GH20 family member. At least one gene duplication before the divergence of animals, plants and fungi as well as other lineage specific duplication events have given rise to multiple paralogous subfamilies in eukaryotes. Phylogenetic analyses also suggest that a second, divergent subfamily of GH20 family genes present in animals derive from an independent prokaryotic source. Our data suggest multiple convergent changes of functional roles of GH20 family members in eukaryotes. CONCLUSION: This study represents the first detailed evolutionary analysis of the glycosyl hydrolase GH20 family. Mapping of data concerning physiological function of GH20 family members onto the phylogenetic tree reveals that apparently convergent and highly lineage specific changes in substrate specificity have occurred in multiple GH20 subfamilies.
- Cheng H, Kim BH, Grishin NV
- Discrimination between distant homologs and structural analogs: lessons from manually constructed, reliable data sets.
- J Mol Biol. 2008; 377: 1265-78
- Display abstract
A natural way to study protein sequence, structure, and function is to put them in the context of evolution. Homologs inherit similarities from their common ancestor, while analogs converge to similar structures due to a limited number of energetically favorable ways to pack secondary structural elements. Using novel strategies, we previously assembled two reliable databases of homologs and analogs. In this study, we compare these two data sets and develop a support vector machine (SVM)-based classifier to discriminate between homologs and analogs. The classifier uses a number of well-known similarity scores. We observe that although both structure scores and sequence scores contribute to SVM performance, profile sequence scores computed based on structural alignments are the best discriminators between remote homologs and structural analogs. We apply our classifier to a representative set from the expert-constructed database, Structural Classification of Proteins (SCOP). The SVM classifier recovers 76% of the remote homologs defined as domains in the same SCOP superfamily but from different families. More importantly, we also detect and discuss interesting homologous relationships between SCOP domains from different superfamilies, folds, and even classes.
- Di Giulio M
- The origin of genes could be polyphyletic.
- Gene. 2008; 426: 39-46
- Display abstract
The paradigm of the monophyletic origin of genes is deeply rooted in us all. For instance, this stems from the observation that the possibility of obtaining a good multiple alignment using the same protein from organisms from the three domains of life (Bacteria, Archaea and Eukarya) would seem to imply that the last universal common ancestor (LUCA) must have had that protein and, therefore, the origin of that gene must necessarily be monophyletic. The hypothesis of a polyphyletic origin of genes has to explain how it was possible to evolve highly conserved regions of multiple alignments of orthologous proteins from the three domains of life when these regions clearly seem to define a monophyletic origin of genes. If mRNAs were assembled at the stage of the LUCA through the trans-splicing of pieces of RNA representing mini-genes, and the translation of these mRNAs resulted in proteins whose genes (DNA) actually only evolved much later, i.e. only after the main domains of life were established, then this would explain why multiple alignments of orthologous proteins can be obtained from the three domains of life. Therefore, this makes these multiple alignments compatible with a polyphyletic origin of genes. I have analysed many multiple alignments of orthologous proteins from the three domains of life, reaching a conclusion that seems to suggest that these alignments are also compatible with a polyphyletic origin of genes because, for instance, they contain protein motifs characterising the domains of life. These motifs, and also genes, might have evolved late on, thus making their polyphyletic origin likely.
- Hausmann CD, Ibba M
- Aminoacyl-tRNA synthetase complexes: molecular multitasking revealed.
- FEMS Microbiol Rev. 2008; 32: 705-21
- Display abstract
The accurate synthesis of proteins, dictated by the corresponding nucleotide sequence encoded in mRNA, is essential for cell growth and survival. Central to this process are the aminoacyl-tRNA synthetases (aaRSs), which provide amino acid substrates for the growing polypeptide chain in the form of aminoacyl-tRNAs. The aaRSs are essential for coupling the correct amino acid and tRNA molecules, but are also known to associate in higher order complexes with proteins involved in processes beyond translation. Multiprotein complexes containing aaRSs are found in all three domains of life playing roles in splicing, apoptosis, viral assembly, and regulation of transcription and translation. An overview of the complexes aaRSs form in all domains of life is presented, demonstrating the extensive network of connections between the translational machinery and cellular components involved in a myriad of essential processes beyond protein synthesis.
- Boussau B, Gueguen L, Gouy M
- Accounting for horizontal gene transfers explains conflicting hypotheses regarding the position of aquificales in the phylogeny of Bacteria.
- BMC Evol Biol. 2008; 8: 272-272
- Display abstract
BACKGROUND: Despite a large agreement between ribosomal RNA and concatenated protein phylogenies, the phylogenetic tree of the bacterial domain remains uncertain in its deepest nodes. For instance, the position of the hyperthermophilic Aquificales is debated, as their commonly observed position close to Thermotogales may proceed from horizontal gene transfers, long branch attraction or compositional biases, and may not represent vertical descent. Indeed, another view, based on the analysis of rare genomic changes, places Aquificales close to epsilon-Proteobacteria. RESULTS: To get a whole genome view of Aquifex relationships, all trees containing sequences from Aquifex in the HOGENOM database were surveyed. This study revealed that Aquifex is most often found as a neighbour to Thermotogales. Moreover, informational genes, which appeared to be less often transferred to the Aquifex lineage than non-informational genes, most often placed Aquificales close to Thermotogales. To ensure these results did not come from long branch attraction or compositional artefacts, a subset of carefully chosen proteins from a wide range of bacterial species was selected for further scrutiny. Among these genes, two phylogenetic hypotheses were found to be significantly more likely than the others: the most likely hypothesis placed Aquificales as a neighbour to Thermotogales, and the second one with epsilon-Proteobacteria. We characterized the genes that supported each of these two hypotheses, and found that differences in rates of evolution or in amino-acid compositions could not explain the presence of two incongruent phylogenetic signals in the alignment. Instead, evidence for a large Horizontal Gene Transfer between Aquificales and epsilon-Proteobacteria was found. CONCLUSION: Methods based on concatenated informational proteins and methods based on character cladistics led to different conclusions regarding the position of Aquificales because this lineage has undergone many horizontal gene transfers. However, if a tree of vertical descent can be reconstructed for Bacteria, our results suggest Aquificales should be placed close to Thermotogales.
- Soria-Carrasco V, Castresana J
- Estimation of phylogenetic inconsistencies in the three domains of life.
- Mol Biol Evol. 2008; 25: 2319-29
- Display abstract
Discrepancies in phylogenetic trees of bacteria and archaea are often explained as lateral gene transfer events. However, such discrepancies may also be due to phylogenetic artifacts or orthology assignment problems. A first step that may help to resolve this dilemma is to estimate the extent of phylogenetic inconsistencies in trees of prokaryotes in comparison with those of higher eukaryotes, where no lateral gene transfer is expected. To test this, we used 21 proteomes each of eukaryotes (mainly opisthokonts), proteobacteria, and archaea that spanned equivalent levels of genetic divergence. In each domain of life, we defined a set of putative orthologous sequences using a phylogenetic-based orthology protocol and, as a reference topology, we used a tree constructed with concatenated genes of each domain. Our results show, for most of the tests performed, that the magnitude of topological inconsistencies with respect to the reference tree was very similar in the trees of proteobacteria and eukaryotes. When clade support was taken into account, prokaryotes showed some more inconsistencies, but then all values were very low. Discrepancies were only consistently higher in archaea but, as shown by simulation analysis, this is likely due to the particular tree of the archaeal species used here being more difficult to reconstruct, whereas the trees of proteobacteria and eukaryotes were of similar difficulty. Although these results are based on a relatively small number of genes, it seems that phylogenetic reconstruction problems, including orthology assignment problems, have a similar overall effect over prokaryotic and eukaryotic trees based on single genes. Consequently, lateral gene transfer between distant prokaryotic species may have been more rare than previously thought, which opens the way to obtain the tree of life of bacterial and archaeal species using genomic data and the concatenation of adequate genes, in the same way as it is usually done in eukaryotes.
- Bailly X, Vanin S, Chabasse C, Mizuguchi K, Vinogradov SN
- A phylogenomic profile of hemerythrins, the nonheme diiron binding respiratory proteins.
- BMC Evol Biol. 2008; 8: 244-244
- Display abstract
BACKGROUND: Hemerythrins, are the non-heme, diiron binding respiratory proteins of brachiopods, priapulids and sipunculans; they are also found in annelids and bacteria, where their functions have not been fully elucidated. RESULTS: A search for putative Hrs in the genomes of 43 archaea, 444 bacteria and 135 eukaryotes, revealed their presence in 3 archaea, 118 bacteria, several fungi, one apicomplexan, a heterolobosan, a cnidarian and several annelids. About a fourth of the Hr sequences were identified as N- or C-terminal domains of chimeric, chemotactic gene regulators. The function of the remaining single domain bacterial Hrs remains to be determined. In addition to oxygen transport, the possible functions in annelids have been proposed to include cadmium-binding, antibacterial action and immunoprotection. A Bayesian phylogenetic tree revealed a split into two clades, one encompassing archaea, bacteria and fungi, and the other comprising the remaining eukaryotes. The annelid and sipunculan Hrs share the same intron-exon structure, different from that of the cnidarian Hr. CONCLUSION: The phylogenomic profile of Hrs demonstrated a limited occurrence in bacteria and archaea and a marked absence in the vast majority of multicellular organisms. Among the metazoa, Hrs have survived in a cnidarian and in a few protostome groups; hence, it appears that in metazoans the Hr gene was lost in deuterostome ancestor(s) after the radiata/bilateria split. Signal peptide sequences in several Hirudinea Hrs suggest for the first time, the possibility of extracellular localization. Since the alpha-helical bundle is likely to have been among the earliest protein folds, Hrs represent an ancient family of iron-binding proteins, whose primary function in bacteria may have been that of an oxygen sensor, enabling aerophilic or aerophobic responses. Although Hrs evolved to function as O2 transporters in brachiopods, priapulids and sipunculans, their function in annelids remains to be elucidated. Overall Hrs exhibit a considerable lack of evolutionary success in metazoans.
- Sheppard K, Soll D
- On the evolution of the tRNA-dependent amidotransferases, GatCAB and GatDE.
- J Mol Biol. 2008; 377: 831-44
- Display abstract
Glutaminyl-tRNA synthetase and asparaginyl-tRNA synthetase evolved from glutamyl-tRNA synthetase and aspartyl-tRNA synthetase, respectively, after the split in the last universal communal ancestor (LUCA). Glutaminyl-tRNA(Gln) and asparaginyl-tRNA(Asn) were likely formed in LUCA by amidation of the mischarged species, glutamyl-tRNA(Gln) and aspartyl-tRNA(Asn), by tRNA-dependent amidotransferases, as is still the case in most bacteria and all known archaea. The amidotransferase GatCAB is found in both domains of life, while the heterodimeric amidotransferase GatDE is found only in Archaea. The GatB and GatE subunits belong to a unique protein family that includes Pet112 that is encoded in the nuclear genomes of numerous eukaryotes. GatE was thought to have evolved from GatB after the emergence of the modern lines of decent. Our phylogenetic analysis though places the split between GatE and GatB, prior to the phylogenetic divide between Bacteria and Archaea, and Pet112 to be of mitochondrial origin. In addition, GatD appears to have emerged prior to the bacterial-archaeal phylogenetic divide. Thus, while GatDE is an archaeal signature protein, it likely was present in LUCA together with GatCAB. Archaea retained both amidotransferases, while Bacteria emerged with only GatCAB. The presence of GatDE has favored a unique archaeal tRNA(Gln) that may be preventing the acquisition of glutaminyl-tRNA synthetase in Archaea. Archaeal GatCAB, on the other hand, has not favored a distinct tRNA(Asn), suggesting that tRNA(Asn) recognition is not a major barrier to the retention of asparaginyl-tRNA synthetase in many Archaea.
- Makarova KS, Sorokin AV, Novichkov PS, Wolf YI, Koonin EV
- Clusters of orthologous genes for 41 archaeal genomes and implications for evolutionary genomics of archaea.
- Biol Direct. 2007; 2: 33-33
- Display abstract
BACKGROUND: An evolutionary classification of genes from sequenced genomes that distinguishes between orthologs and paralogs is indispensable for genome annotation and evolutionary reconstruction. Shortly after multiple genome sequences of bacteria, archaea, and unicellular eukaryotes became available, an attempt on such a classification was implemented in Clusters of Orthologous Groups of proteins (COGs). Rapid accumulation of genome sequences creates opportunities for refining COGs but also represents a challenge because of error amplification. One of the practical strategies involves construction of refined COGs for phylogenetically compact subsets of genomes. RESULTS: New Archaeal Clusters of Orthologous Genes (arCOGs) were constructed for 41 archaeal genomes (13 Crenarchaeota, 27 Euryarchaeota and one Nanoarchaeon) using an improved procedure that employs a similarity tree between smaller, group-specific clusters, semi-automatically partitions orthology domains in multidomain proteins, and uses profile searches for identification of remote orthologs. The annotation of arCOGs is a consensus between three assignments based on the COGs, the CDD database, and the annotations of homologs in the NR database. The 7538 arCOGs, on average, cover approximately 88% of the genes in a genome compared to a approximately 76% coverage in COGs. The finer granularity of ortholog identification in the arCOGs is apparent from the fact that 4538 arCOGs correspond to 2362 COGs; approximately 40% of the arCOGs are new. The archaeal gene core (protein-coding genes found in all 41 genome) consists of 166 arCOGs. The arCOGs were used to reconstruct gene loss and gene gain events during archaeal evolution and gene sets of ancestral forms. The Last Archaeal Common Ancestor (LACA) is conservatively estimated to possess 996 genes compared to 1245 and 1335 genes for the last common ancestors of Crenarchaeota and Euryarchaeota, respectively. It is inferred that LACA was a chemoautotrophic hyperthermophile that, in addition to the core archaeal functions, encoded more idiosyncratic systems, e.g., the CASS systems of antivirus defense and some toxin-antitoxin systems. CONCLUSION: The arCOGs provide a convenient, flexible framework for functional annotation of archaeal genomes, comparative genomics and evolutionary reconstructions. Genomic reconstructions suggest that the last common ancestor of archaea might have been (nearly) as advanced as the modern archaeal hyperthermophiles. ArCOGs and related information are available at: ftp://ftp.ncbi.nih.gov/pub/koonin/arCOGs/.
- Burroughs AM, Balaji S, Iyer LM, Aravind L
- Small but versatile: the extraordinary functional and structural diversity of the beta-grasp fold.
- Biol Direct. 2007; 2: 18-18
- Display abstract
BACKGROUND: The beta-grasp fold (beta-GF), prototyped by ubiquitin (UB), has been recruited for a strikingly diverse range of biochemical functions. These functions include providing a scaffold for different enzymatic active sites (e.g. NUDIX phosphohydrolases) and iron-sulfur clusters, RNA-soluble-ligand and co-factor-binding, sulfur transfer, adaptor functions in signaling, assembly of macromolecular complexes and post-translational protein modification. To understand the basis for the functional versatility of this small fold we undertook a comprehensive sequence-structure analysis of the fold and developed a natural classification for its members. RESULTS: As a result we were able to define the core distinguishing features of the fold and numerous elaborations, including several previously unrecognized variants. Systematic analysis of all known interactions of the fold showed that its manifold functional abilities arise primarily from the prominent beta-sheet, which provides an exposed surface for diverse interactions or additionally, by forming open barrel-like structures. We show that in the beta-GF both enzymatic activities and the binding of diverse co-factors (e.g. molybdopterin) have independently evolved on at least three occasions each, and iron-sulfur-cluster-binding on at least two independent occasions. Our analysis identified multiple previously unknown large monophyletic assemblages within the beta-GF, including one which unifies versions found in the fasciclin-1 superfamily, the ribosomal protein L25, the phosphoribosyl AMP cyclohydrolase (HisI) and glutamine synthetase. We also uncovered several new groups of beta-GF domains including a domain found in bacterial flagellar and fimbrial assembly components, and 5 new UB-like domains in the eukaryotes. CONCLUSION: Evolutionary reconstruction indicates that the beta-GF had differentiated into at least 7 distinct lineages by the time of the last universal common ancestor of all extant organisms, encompassing much of the structural diversity observed in extant versions of the fold. The earliest beta-GF members were probably involved in RNA metabolism and subsequently radiated into various functional niches. Most of the structural diversification occurred in the prokaryotes, whereas the eukaryotic phase was mainly marked by a specific expansion of the ubiquitin-like beta-GF members. The eukaryotic UB superfamily diversified into at least 67 distinct families, of which at least 19-20 families were already present in the eukaryotic common ancestor, including several protein and one lipid conjugated forms. Another key aspect of the eukaryotic phase of evolution of the beta-GF was the dramatic increase in domain architectural complexity of proteins related to the expansion of UB-like domains in numerous adaptor roles.
- Wolf YI, Koonin EV
- On the origin of the translation system and the genetic code in the RNA world by means of natural selection, exaptation, and subfunctionalization.
- Biol Direct. 2007; 2: 14-14
- Display abstract
BACKGROUND: The origin of the translation system is, arguably, the central and the hardest problem in the study of the origin of life, and one of the hardest in all evolutionary biology. The problem has a clear catch-22 aspect: high translation fidelity hardly can be achieved without a complex, highly evolved set of RNAs and proteins but an elaborate protein machinery could not evolve without an accurate translation system. The origin of the genetic code and whether it evolved on the basis of a stereochemical correspondence between amino acids and their cognate codons (or anticodons), through selectional optimization of the code vocabulary, as a "frozen accident" or via a combination of all these routes is another wide open problem despite extensive theoretical and experimental studies. Here we combine the results of comparative genomics of translation system components, data on interaction of amino acids with their cognate codons and anticodons, and data on catalytic activities of ribozymes to develop conceptual models for the origins of the translation system and the genetic code. RESULTS: Our main guide in constructing the models is the Darwinian Continuity Principle whereby a scenario for the evolution of a complex system must consist of plausible elementary steps, each conferring a distinct advantage on the evolving ensemble of genetic elements. Evolution of the translation system is envisaged to occur in a compartmentalized ensemble of replicating, co-selected RNA segments, i.e., in a RNA World containing ribozymes with versatile activities. Since evolution has no foresight, the translation system could not evolve in the RNA World as the result of selection for protein synthesis and must have been a by-product of evolution drive by selection for another function, i.e., the translation system evolved via the exaptation route. It is proposed that the evolutionary process that eventually led to the emergence of translation started with the selection for ribozymes binding abiogenic amino acids that stimulated ribozyme-catalyzed reactions. The proposed scenario for the evolution of translation consists of the following steps: binding of amino acids to a ribozyme resulting in an enhancement of its catalytic activity; evolution of the amino-acid-stimulated ribozyme into a peptide ligase (predecessor of the large ribosomal subunit) yielding, initially, a unique peptide activating the original ribozyme and, possibly, other ribozymes in the ensemble; evolution of self-charging proto-tRNAs that were selected, initially, for accumulation of amino acids, and subsequently, for delivery of amino acids to the peptide ligase; joining of the peptide ligase with a distinct RNA molecule (predecessor of the small ribosomal subunit) carrying a built-in template for more efficient, complementary binding of charged proto-tRNAs; evolution of the ability of the peptide ligase to assemble peptides using exogenous RNAs as template for complementary binding of charged proteo-tRNAs, yielding peptides with the potential to activate different ribozymes; evolution of the translocation function of the protoribosome leading to the production of increasingly longer peptides (the first proteins), i.e., the origin of translation. The specifics of the recognition of amino acids by proto-tRNAs and the origin of the genetic code depend on whether or not there is a physical affinity between amino acids and their cognate codons or anticodons, a problem that remains unresolved. CONCLUSION: We describe a stepwise model for the origin of the translation system in the ancient RNA world such that each step confers a distinct advantage onto an ensemble of co-evolving genetic elements. Under this scenario, the primary cause for the emergence of translation was the ability of amino acids and peptides to stimulate reactions catalyzed by ribozymes. Thus, the translation system might have evolved as the result of selection for ribozymes capable of, initially, efficient amino acid binding, and subsequently, synthesis of increasingly versatile peptides. Several aspects of this scenario are amenable to experimental testing.
- Wang M, Yafremava LS, Caetano-Anolles D, Mittenthal JE, Caetano-Anolles G
- Reductive evolution of architectural repertoires in proteomes and the birth of the tripartite world.
- Genome Res. 2007; 17: 1572-85
- Display abstract
The repertoire of protein architectures in proteomes is evolutionarily conserved and capable of preserving an accurate record of genomic history. Here we use a census of protein architecture in 185 genomes that have been fully sequenced to generate genome-based phylogenies that describe the evolution of the protein world at fold (F) and fold superfamily (FSF) levels. The patterns of representation of F and FSF architectures over evolutionary history suggest three epochs in the evolution of the protein world: (1) architectural diversification, where members of an architecturally rich ancestral community diversified their protein repertoire; (2) superkingdom specification, where superkingdoms Archaea, Bacteria, and Eukarya were specified; and (3) organismal diversification, where F and FSF specific to relatively small sets of organisms appeared as the result of diversification of organismal lineages. Functional annotation of FSF along these architectural chronologies revealed patterns of discovery of biological function. Most importantly, the analysis identified an early and extensive differential loss of architectures occurring primarily in Archaea that segregates the archaeal lineage from the ancient community of organisms and establishes the first organismal divide. Reconstruction of phylogenomic trees of proteomes reflects the timeline of architectural diversification in the emerging lineages. Thus, Archaea undertook a minimalist strategy using only a small subset of the full architectural repertoire and then crystallized into a diversified superkingdom late in evolution. Our analysis also suggests a communal ancestor to all life that was molecularly complex and adopted genomic strategies currently present in Eukarya.
- Brochier-Armanet C, Forterre P
- Widespread distribution of archaeal reverse gyrase in thermophilic bacteria suggests a complex history of vertical inheritance and lateral gene transfers.
- Archaea. 2007; 2: 83-93
- Display abstract
Reverse gyrase, an enzyme of uncertain funtion, is present in all hyperthermophilic archaea and bacteria. Previous phylogenetic studies have suggested that the gene for reverse gyrase has an archaeal origin and was transferred laterally (LGT) to the ancestors of the two bacterial hyperthermophilic phyla, Thermotogales and Aquificales. Here, we performed an in-depth analysis of the evolutionary history of reverse gyrase in light of genomic progress. We found genes coding for reverse gyrase in the genomes of several thermophilic bacteria that belong to phyla other than Aquificales and Thermotogales. Several of these bacteria are not, strictly speaking, hyperthermophiles because their reported optimal growth temperatures are below 80 degrees C. Furthermore, we detected a reverse gyrase gene in the sequence of the large plasmid of Thermus thermophilus strain HB8, suggesting a possible mechanism of transfer to the T. thermophilus strain HB8 involving plasmids and transposases. The archaeal part of the reverse gyrase tree is congruent with recent phylogenies of the archaeal domain based on ribosomal proteins or RNA polymerase subunits. Although poorly resolved, the complete reverse gyrase phylogeny suggests an ancient acquisition of the gene by bacteria via one or two LGT events, followed by its secondary distribution by LGT within bacteria. Finally, several genes of archaeal origin located in proximity to the reverse gyrase gene in bacterial genomes have bacterial homologues mostly in thermophiles or hyperthermophiles, raising the possibility that they were co-transferred with the reverse gyrase gene. Our new analysis of the reverse gyrase history strengthens the hypothesis that the acquisition of reverse gyrase may have been a crucial evolutionary step in the adaptation of bacteria to high-temperature environments. However, it also questions the role of this enzyme in thermophilic bacteria and the selective advantage its presence could provide.
- Rounge TB, Rohrlack T, Tooming-Klunderud A, Kristensen T, Jakobsen KS
- Comparison of cyanopeptolin genes in Planktothrix, Microcystis, and Anabaena strains: evidence for independent evolution within each genus.
- Appl Environ Microbiol. 2007; 73: 7322-30
- Display abstract
The major cyclic peptide cyanopeptolin 1138, produced by Planktothrix strain NIVA CYA 116, was characterized and shown to be structurally very close to the earlier-characterized oscillapeptin E. A cyanopeptolin gene cluster likely to encode the corresponding peptide synthetase was sequenced from the same strain. The 30-kb oci gene cluster contains two novel domains previously not detected in nonribosomal peptide synthetase gene clusters (a putative glyceric acid-activating domain and a sulfotransferase domain), in addition to seven nonribosomal peptide synthetase modules. Unlike in two previously described cyanopeptolin gene clusters from Anabaena and Microcystis, a halogenase gene is not present. The three cyanopeptolin gene clusters show similar gene and domain arrangements, while the binding pocket signatures deduced from the adenylation domain sequences and the additional tailoring domains vary. This suggests loss and gain of tailoring domains within each genus, after the diversification of the three clades, as major events leading to the present diversity. The ABC transporter genes associated with the cyanopeptolin gene clusters form a monophyletic clade and accordingly are likely to have evolved as part of the functional unit. Phylogenetic analyses of adenylation and condensation domains, including domains from cyanopeptolins and microcystins, show a closer similarity between the Planktothrix and Microcystis cyanopeptolin domains than between these and the Anabaena domain. No clear evidence of recombination between cyanopeptolins and microcystins could be detected. There were no strong indications of horizontal gene transfer of cyanopeptolin gene sequences across the three genera, supporting independent evolution within each genus.
- Smits P, Smeitink JA, van den Heuvel LP, Huynen MA, Ettema TJ
- Reconstructing the evolution of the mitochondrial ribosomal proteome.
- Nucleic Acids Res. 2007; 35: 4686-703
- Display abstract
For production of proteins that are encoded by the mitochondrial genome, mitochondria rely on their own mitochondrial translation system, with the mitoribosome as its central component. Using extensive homology searches, we have reconstructed the evolutionary history of the mitoribosomal proteome that is encoded by a diverse subset of eukaryotic genomes, revealing an ancestral ribosome of alpha-proteobacterial descent that more than doubled its protein content in most eukaryotic lineages. We observe large variations in the protein content of mitoribosomes between different eukaryotes, with mammalian mitoribosomes sharing only 74 and 43% of its proteins with yeast and Leishmania mitoribosomes, respectively. We detected many previously unidentified mitochondrial ribosomal proteins (MRPs) and found that several have increased in size compared to their bacterial ancestral counterparts by addition of functional domains. Several new MRPs have originated via duplication of existing MRPs as well as by recruitment from outside of the mitoribosomal proteome. Using sensitive profile-profile homology searches, we found hitherto undetected homology between bacterial and eukaryotic ribosomal proteins, as well as between fungal and mammalian ribosomal proteins, detecting two novel human MRPs. These newly detected MRPs constitute, along with evolutionary conserved MRPs, excellent new screening targets for human patients with unresolved mitochondrial oxidative phosphorylation disorders.
- Nikolskaya AN, Arighi CN, Huang H, Barker WC, Wu CH
- PIRSF family classification system for protein functional and evolutionary analysis.
- Evol Bioinform Online. 2006; 2: 197-209
- Display abstract
The PIRSF protein classification system (http://pir.georgetown.edu/pirsf/) reflects evolutionary relationships of full-length proteins and domains. The primary PIRSF classification unit is the homeomorphic family, whose members are both homologous (evolved from a common ancestor) and homeomorphic (sharing full-length sequence similarity and a common domain architecture). PIRSF families are curated systematically based on literature review and integrative sequence and functional analysis, including sequence and structure similarity, domain architecture, functional association, genome context, and phyletic pattern. The results of classification and expert annotation are summarized in PIRSF family reports with graphical viewers for taxonomic distribution, domain architecture, family hierarchy, and multiple alignment and phylogenetic tree. The PIRSF system provides a comprehensive resource for bioinformatics analysis and comparative studies of protein function and evolution. Domain or fold-based searches allow identification of evolutionarily related protein families sharing domains or structural folds. Functional convergence and functional divergence are revealed by the relationships between protein classification and curated family functions. The taxonomic distribution allows the identification of lineage-specific or broadly conserved protein families and can reveal horizontal gene transfer. Here we demonstrate, with illustrative examples, how to use the web-based PIRSF system as a tool for functional and evolutionary studies of protein families.
- Iyer LM, Balaji S, Koonin EV, Aravind L
- Evolutionary genomics of nucleo-cytoplasmic large DNA viruses.
- Virus Res. 2006; 117: 156-84
- Display abstract
A previous comparative-genomic study of large nuclear and cytoplasmic DNA viruses (NCLDVs) of eukaryotes revealed the monophyletic origin of four viral families: poxviruses, asfarviruses, iridoviruses, and phycodnaviruses [Iyer, L.M., Aravind, L., Koonin, E.V., 2001. Common origin of four diverse families of large eukaryotic DNA viruses. J. Virol. 75 (23), 11720-11734]. Here we update this analysis by including the recently sequenced giant genome of the mimiviruses and several additional genomes of iridoviruses, phycodnaviruses, and poxviruses. The parsimonious reconstruction of the gene complement of the ancestral NCLDV shows that it was a complex virus with at least 41 genes that encoded the replication machinery, up to four RNA polymerase subunits, at least three transcription factors, capping and polyadenylation enzymes, the DNA packaging apparatus, and structural components of an icosahedral capsid and the viral membrane. The phylogeny of the NCLDVs is reconstructed by cladistic analysis of the viral gene complements, and it is shown that the two principal lineages of NCLDVs are comprised of poxviruses grouped with asfarviruses and iridoviruses grouped with phycodnaviruses-mimiviruses. The phycodna-mimivirus grouping was strongly supported by several derived shared characters, which seemed to rule out the previously suggested basal position of the mimivirus [Raoult, D., Audic, S., Robert, C., Abergel, C., Renesto, P., Ogata, H., La Scola, B., Suzan, M., Claverie, J.M. 2004. The 1.2-megabase genome sequence of Mimivirus. Science 306 (5700), 1344-1350]. These results indicate that the divergence of the major NCLDV families occurred at an early stage of evolution, prior to the divergence of the major eukaryotic lineages. It is shown that subsequent evolution of the NCLDV genomes involved lineage-specific expansion of paralogous gene families and acquisition of numerous genes via horizontal gene transfer from the eukaryotic hosts, other viruses, and bacteria (primarily, endosymbionts and parasites). Amongst the expansions, there are multiple families of predicted virus-specific signaling and regulatory domains. Most NCLDVs have also acquired large arrays of genes related to ubiquitin signaling, and the animal viruses in particular have independently evolved several defenses against apoptosis and immune response, including growth factors and potential inhibitors of cytokine signaling. The mimivirus displays an enormous array of genes of bacterial provenance, including a representative of a new class of predicted papain-like peptidases. It is further demonstrated that a significant number of genes found in NCLDVs also have homologs in bacteriophages, although a vertical relationship between the NCLDVs and a particular bacteriophage group could not be established. On the basis of these observations, two alternative scenarios for the origin of the NCLDVs and other groups of large DNA viruses of eukaryotes are considered. One of these scenarios posits an early assembly of an already large DNA virus precursor from which various large DNA viruses diverged through an ongoing process of displacement of the original genes by xenologous or non-orthologous genes from various sources. The second scenario posits convergent emergence, on multiple occasions, of large DNA viruses from small plasmid-like precursors through independent accretion of similar sets of genes due to strong selective pressures imposed by their life cycles and hosts.
- Kasai K, Nishizawa T, Takahashi K, Hosaka T, Aoki H, Ochi K
- Physiological analysis of the stringent response elicited in an extreme thermophilic bacterium, Thermus thermophilus.
- J Bacteriol. 2006; 188: 7111-22
- Display abstract
Guanosine tetraphosphate (ppGpp) is a key mediator of stringent control, an adaptive response of bacteria to amino acid starvation, and has thus been termed a bacterial alarmone. Previous X-ray crystallographic analysis has provided a structural basis for the transcriptional regulation of RNA polymerase activity by ppGpp in the thermophilic bacterium Thermus thermophilus. Here we investigated the physiological basis of the stringent response by comparing the changes in intracellular ppGpp levels and the rate of RNA synthesis in stringent (rel(+); wild type) and relaxed (relA and relC; mutant) strains of T. thermophilus. We found that in wild-type T. thermophilus, as in other bacteria, serine hydroxamate, an amino acid analogue that inhibits tRNA(Ser) aminoacylation, elicited a stringent response characterized in part by intracellular accumulation of ppGpp and that this response was completely blocked in a relA-null mutant and partially blocked in a relC mutant harboring a mutation in the ribosomal protein L11. Subsequent in vitro assays using ribosomes isolated from wild-type and relA and relC mutant strains confirmed that (p)ppGpp is synthesized by ribosomes and that mutation of RelA or L11 blocks that activity. This conclusion was further confirmed in vitro by demonstrating that thiostrepton or tetracycline inhibits (p)ppGpp synthesis. In an in vitro system, (p)ppGpp acted by inhibiting RNA polymerase-catalyzed 23S/5S rRNA gene transcription but at a concentration much higher than that of the observed intracellular ppGpp pool size. On the other hand, changes in the rRNA gene promoter activity tightly correlated with changes in the GTP but not ATP concentration. Also, (p)ppGpp exerted a potent inhibitory effect on IMP dehydrogenase activity. The present data thus complement the earlier structural analysis by providing physiological evidence that T. thermophilus does produce ppGpp in response to amino acid starvation in a ribosome-dependent (i.e., RelA-dependent) manner. However, it appears that in T. thermophilus, rRNA promoter activity is controlled directly by the GTP pool size, which is modulated by ppGpp via inhibition of IMP dehydrogenase activity. Thus, unlike the case of Escherichia coli, ppGpp may not inhibit T. thermophilus RNA polymerase activity directly in vivo, as recently proposed for Bacillus subtilis rRNA transcription (L. Krasny and R. L. Gourse, EMBO J. 23:4473-4483, 2004).
- Sekine S, Shichiri M, Bernier S, Chenevert R, Lapointe J, Yokoyama S
- Structural bases of transfer RNA-dependent amino acid recognition and activation by glutamyl-tRNA synthetase.
- Structure. 2006; 14: 1791-9
- Display abstract
Glutamyl-tRNA synthetase (GluRS) is one of the aminoacyl-tRNA synthetases that require the cognate tRNA for specific amino acid recognition and activation. We analyzed the role of tRNA in amino acid recognition by crystallography. In the GluRS*tRNA(Glu)*Glu structure, GluRS and tRNA(Glu) collaborate to form a highly complementary L-glutamate-binding site. This collaborative site is functional, as it is formed in the same manner in pretransition-state mimic, GluRS*tRNA(Glu)*ATP*Eol (a glutamate analog), and posttransition-state mimic, GluRS*tRNA(Glu)*ESA (a glutamyl-adenylate analog) structures. In contrast, in the GluRS*Glu structure, only GluRS forms the amino acid-binding site, which is defective and accounts for the binding of incorrect amino acids, such as D-glutamate and L-glutamine. Therefore, tRNA(Glu) is essential for formation of the completely functional binding site for L-glutamate. These structures, together with our previously described structures, reveal that tRNA plays a crucial role in accurate positioning of both L-glutamate and ATP, thus driving the amino acid activation.
- Vinogradov SN et al.
- A phylogenomic profile of globins.
- BMC Evol Biol. 2006; 6: 31-31
- Display abstract
BACKGROUND: Globins occur in all three kingdoms of life: they can be classified into single-domain globins and chimeric globins. The latter comprise the flavohemoglobins with a C-terminal FAD-binding domain and the gene-regulating globin coupled sensors, with variable C-terminal domains. The single-domain globins encompass sequences related to chimeric globins and "truncated" hemoglobins with a 2-over-2 instead of the canonical 3-over-3 alpha-helical fold. RESULTS: A census of globins in 26 archaeal, 245 bacterial and 49 eukaryote genomes was carried out. Only approximately 25% of archaea have globins, including globin coupled sensors, related single domain globins and 2-over-2 globins. From one to seven globins per genome were found in approximately 65% of the bacterial genomes: the presence and number of globins are positively correlated with genome size. Globins appear to be mostly absent in Bacteroidetes/Chlorobi, Chlamydia, Lactobacillales, Mollicutes, Rickettsiales, Pastorellales and Spirochaetes. Single domain globins occur in metazoans and flavohemoglobins are found in fungi, diplomonads and mycetozoans. Although red algae have single domain globins, including 2-over-2 globins, the green algae and ciliates have only 2-over-2 globins. Plants have symbiotic and nonsymbiotic single domain hemoglobins and 2-over-2 hemoglobins. Over 90% of eukaryotes have globins: the nematode Caenorhabditis has the most putative globins, approximately 33. No globins occur in the parasitic, unicellular eukaryotes such as Encephalitozoon, Entamoeba, Plasmodium and Trypanosoma. CONCLUSION: Although Bacteria have all three types of globins, Archaeado not have flavohemoglobins and Eukaryotes lack globin coupled sensors. Since the hemoglobins in organisms other than animals are enzymes or sensors, it is likely that the evolution of an oxygen transport function accompanied the emergence of multicellular animals.
- Vinogradov SN et al.
- Three globin lineages belonging to two structural classes in genomes from the three kingdoms of life.
- Proc Natl Acad Sci U S A. 2005; 102: 11385-9
- Display abstract
Although most globins, including the N-terminal domains within chimeric proteins such as flavohemoglobins and globin-coupled sensors, exhibit a 3/3 helical sandwich structure, many bacterial, plant, and ciliate globins have a 2/2 helical sandwich structure. We carried out a comprehensive survey of globins in the genomes from the three kingdoms of life. Bayesian phylogenetic trees based on manually aligned sequences indicate the possibility of past horizontal globin gene transfers from bacteria to eukaryotes. blastp searches revealed the presence of 3/3 single-domain globins related to the globin domains of the bacterial and fungal flavohemoglobins in many bacteria, a red alga, and a diatom. Iterated psi-blast searches based on groups of globin sequences found that only the single-domain globins and flavohemoglobins recognize the eukaryote 3/3 globins, including vertebrate neuroglobins, alpha- and beta-globins, and cytoglobins. The 2/2 globins recognize the flavohemoglobins, as do the globin coupled sensors and the closely related single-domain protoglobins. However, the 2/2 globins and the globin-coupled sensors do not recognize each other. Thus, all globins appear to be distributed among three lineages: (i) the 3/3 plant and metazoan globins, single-domain globins, and flavohemoglobins; (ii) the bacterial 3/3 globin-coupled sensors and protoglobins; and (iii) the bacterial, plant, and ciliate 2/2 globins. The three lineages may have evolved from an ancestral 3/3 or 2/2 globin. Furthermore, it appears likely that the predominant functions of globins are enzymatic and that oxygen transport is a specialized development that accompanied the evolution of metazoans.
- Theobald DL, Wuttke DS
- Divergent evolution within protein superfolds inferred from profile-based phylogenetics.
- J Mol Biol. 2005; 354: 722-37
- Display abstract
Many dissimilar protein sequences fold into similar structures. A central and persistent challenge facing protein structural analysis is the discrimination between homology and convergence for structurally similar domains that lack significant sequence similarity. Classic examples are the OB-fold and SH3 domains, both small, modular beta-barrel protein superfolds. The similarities among these domains have variously been attributed to common descent or to convergent evolution. Using a sequence profile-based phylogenetic technique, we analyzed all structurally characterized OB-fold, SH3, and PDZ domains with less than 40% mutual sequence identity. An all-against-all, profile-versus-profile analysis of these domains revealed many previously undetectable significant interrelationships. The matrices of scores were used to infer phylogenies based on our derivation of the relationships between sequence similarity E-values and evolutionary distances. The resulting clades of domains correlate remarkably well with biological function, as opposed to structural similarity, indicating that the functionally distinct sub-families within these superfolds are homologous. This method extends phylogenetics into the challenging "twilight zone" of sequence similarity, providing the first objective resolution of deep evolutionary relationships among distant protein families.
- Suchard MA
- Stochastic models for horizontal gene transfer: taking a random walk through tree space.
- Genetics. 2005; 170: 419-31
- Display abstract
Horizontal gene transfer (HGT) plays a critical role in evolution across all domains of life with important biological and medical implications. I propose a simple class of stochastic models to examine HGT using multiple orthologous gene alignments. The models function in a hierarchical phylogenetic framework. The top level of the hierarchy is based on a random walk process in "tree space" that allows for the development of a joint probabilistic distribution over multiple gene trees and an unknown, but estimable species tree. I consider two general forms of random walks. The first form is derived from the subtree prune and regraft (SPR) operator that mirrors the observed effects that HGT has on inferred trees. The second form is based on walks over complete graphs and offers numerically tractable solutions for an increasing number of taxa. The bottom level of the hierarchy utilizes standard phylogenetic models to reconstruct gene trees given multiple gene alignments conditional on the random walk process. I develop a well-mixing Markov chain Monte Carlo algorithm to fit the models in a Bayesian framework. I demonstrate the flexibility of these stochastic models to test competing ideas about HGT by examining the complexity hypothesis. Using 144 orthologous gene alignments from six prokaryotes previously collected and analyzed, Bayesian model selection finds support for (1) the SPR model over the alternative form, (2) the 16S rRNA reconstruction as the most likely species tree, and (3) increased HGT of operational genes compared to informational genes.
- Andersson JO, Sarchfield SW, Roger AJ
- Gene transfers from nanoarchaeota to an ancestor of diplomonads and parabasalids.
- Mol Biol Evol. 2005; 22: 85-90
- Display abstract
Rare evolutionary events, such as lateral gene transfers and gene fusions, may be useful to pinpoint, and correlate the timing of, key branches across the tree of life. For example, the shared possession of a transferred gene indicates a phylogenetic relationship among organismal lineages by virtue of their shared common ancestral recipient. Here, we present phylogenetic analyses of prolyl-tRNA and alanyl-tRNA synthetase genes that indicate lateral gene transfer events to an ancestor of the diplomonads and parabasalids from lineages more closely related to the newly discovered archaeal hyperthermophile Nanoarchaeum equitans (Nanoarchaeota) than to Crenarchaeota or Euryarchaeota. The support for this scenario is strong from all applied phylogenetic methods for the alanyl-tRNA sequences, whereas the phylogenetic analyses of the prolyl-tRNA sequences show some disagreements between methods, indicating that the donor lineage cannot be identified with a high degree of certainty. However, in both trees, the diplomonads and parabasalids branch together within the Archaea, strongly suggesting that these two groups of unicellular eukaryotes, often regarded as the two earliest independent offshoots of the eukaryotic lineage, share a common ancestor to the exclusion of the eukaryotic root. Unfortunately, the phylogenetic analyses of these two aminoacyl-tRNA synthetase genes are inconclusive regarding the position of the diplomonad/parabasalid group within the eukaryotes. Our results also show that the lineage leading to Nanoarchaeota branched off from Euryarchaeota and Crenarchaeota before the divergence of diplomonads and parabasalids, that this unexplored archaeal diversity, currently only represented by the hyperthermophilic organism Nanoarchaeum equitans, may include members living in close proximity to mesophilic eukaryotes, and that the presence of split genes in the Nanoarchaeum genome is a derived feature.
- Rogozin IB, Sverdlov AV, Babenko VN, Koonin EV
- Analysis of evolution of exon-intron structure of eukaryotic genes.
- Brief Bioinform. 2005; 6: 118-34
- Display abstract
The availability of multiple, complete eukaryotic genome sequences allows one to address many fundamental evolutionary questions on genome scale. One such important, long-standing problem is evolution of exon-intron structure of eukaryotic genes. Analysis of orthologous genes from completely sequenced genomes revealed numerous shared intron positions in orthologous genes from animals and plants and even between animals, plants and protists. The data on shared and lineage-specific intron positions were used as the starting point for evolutionary reconstruction with parsimony and maximum-likelihood approaches. Parsimony methods produce reconstructions with intron-rich ancestors but also infer lineage-specific, in many cases, high levels of intron loss and gain. Different probabilistic models gave opposite results, apparently depending on model parameters and assumptions, from domination of intron loss, with extremely intron-rich ancestors, to dramatic excess of gains, to the point of denying any true conservation of intron positions among deep eukaryotic lineages. Development of models with adequate, realistic parameters and assumptions seems to be crucial for obtaining more definitive estimates of intron gain and loss in different eukaryotic lineages. Many shared intron positions were detected in ancestral eukaryotic paralogues which evolved by duplication prior to the divergence of extant eukaryotic lineages. These findings indicate that numerous introns were present in eukaryotic genes already at the earliest stages of evolution of eukaryotes and are compatible with the hypothesis that the original, catastrophic intron invasion accompanied the emergence of the eukaryotic cells. Comparison of various features of old and younger introns starts shedding light on probable mechanisms of intron insertion, indicating that propagation of old introns is unlikely to be a major mechanism for origin of new ones. The existence and structure of ancestral protosplice sites were addressed by examining the context of introns inserted within codons that encode amino acids conserved in all eukaryotes and, accordingly, are not subject to selection for splicing efficiency. It was shown that introns indeed predominantly insert into or are fixed in specific protosplice sites which have the consensus sequence (A/C)AG|Gt.
- Cai W, Pei J, Grishin NV
- Reconstruction of ancestral protein sequences and its applications.
- BMC Evol Biol. 2004; 4: 33-33
- Display abstract
BACKGROUND: Modern-day proteins were selected during long evolutionary history as descendants of ancient life forms. In silico reconstruction of such ancestral protein sequences facilitates our understanding of evolutionary processes, protein classification and biological function. Additionally, reconstructed ancestral protein sequences could serve to fill in sequence space thus aiding remote homology inference. RESULTS: We developed ANCESCON, a package for distance-based phylogenetic inference and reconstruction of ancestral protein sequences that takes into account the observed variation of evolutionary rates between positions that more precisely describes the evolution of protein families. To improve the accuracy of evolutionary distance estimation and ancestral sequence reconstruction, two approaches are proposed to estimate position-specific evolutionary rates. Comparisons show that at large evolutionary distances our method gives more accurate ancestral sequence reconstruction than PAML, PHYLIP and PAUP*. We apply the reconstructed ancestral sequences to homology inference and functional site prediction. We show that the usage of hypothetical ancestors together with the present day sequences improves profile-based sequence similarity searches; and that ancestral sequence reconstruction methods can be used to predict positions with functional specificity. CONCLUSIONS: As a computational tool to reconstruct ancestral protein sequences from a given multiple sequence alignment, ANCESCON shows high accuracy in tests and helps detection of remote homologs and prediction of functional sites. ANCESCON is freely available for non-commercial use. Pre-compiled versions for several platforms can be downloaded from ftp://iole.swmed.edu/pub/ANCESCON/.
- Leipe DD, Koonin EV, Aravind L
- STAND, a class of P-loop NTPases including animal and plant regulators of programmed cell death: multiple, complex domain architectures, unusual phyletic patterns, and evolution by horizontal gene transfer.
- J Mol Biol. 2004; 343: 1-28
- Display abstract
Using sequence profile analysis and sequence-based structure predictions, we define a previously unrecognized, widespread class of P-loop NTPases. The signal transduction ATPases with numerous domains (STAND) class includes the AP-ATPases (animal apoptosis regulators CED4/Apaf-1, plant disease resistance proteins, and bacterial AfsR-like transcription regulators) and NACHT NTPases (e.g. NAIP, TLP1, Het-E-1) that have been studied extensively in the context of apoptosis, pathogen response in animals and plants, and transcriptional regulation in bacteria. We show that, in addition to these well-characterized protein families, the STAND class includes several other groups of (predicted) NTPase domains from diverse signaling and transcription regulatory proteins from bacteria and eukaryotes, and three Archaea-specific families. We identified the STAND domain in several biologically well-characterized proteins that have not been suspected to have NTPase activity, including soluble adenylyl cyclases, nephrocystin 3 (implicated in polycystic kidney disease), and Rolling pebble (a regulator of muscle development); these findings are expected to facilitate elucidation of the functions of these proteins. The STAND class belongs to the additional strand, catalytic E division of P-loop NTPases together with the AAA+ ATPases, RecA/helicase-related ATPases, ABC-ATPases, and VirD4/PilT-like ATPases. The STAND proteins are distinguished from other P-loop NTPases by the presence of unique sequence motifs associated with the N-terminal helix and the core strand-4, as well as a C-terminal helical bundle that is fused to the NTPase domain. This helical module contains a signature GxP motif in the loop between the two distal helices. With the exception of the archaeal families, almost all STAND NTPases are multidomain proteins containing three or more domains. In addition to the NTPase domain, these proteins typically contain DNA-binding or protein-binding domains, superstructure-forming repeats, such as WD40 and TPR, and enzymatic domains involved in signal transduction, including adenylate cyclases and kinases. By analogy to the AAA+ ATPases, it can be predicted that STAND NTPases use the C-terminal helical bundle as a "lever" to transmit the conformational changes brought about by NTP hydrolysis to effector domains. STAND NTPases represent a novel paradigm in signal transduction, whereby adaptor, regulatory switch, scaffolding, and, in some cases, signal-generating moieties are combined into a single polypeptide. The STAND class consists of 14 distinct families, and the evolutionary history of most of these families is riddled with dramatic instances of lineage-specific expansion and apparent horizontal gene transfer. The STAND NTPases are most abundant in developmentally and organizationally complex prokaryotes and eukaryotes. Transfer of genes for STAND NTPases from bacteria to eukaryotes on several occasions might have played a significant role in the evolution of eukaryotic signaling systems.
- Novichkov PS, Omelchenko MV, Gelfand MS, Mironov AA, Wolf YI, Koonin EV
- Genome-wide molecular clock and horizontal gene transfer in bacterial evolution.
- J Bacteriol. 2004; 186: 6575-85
- Display abstract
We describe a simple theoretical framework for identifying orthologous sets of genes that deviate from a clock-like model of evolution. The approach used is based on comparing the evolutionary distances within a set of orthologs to a standard intergenomic distance, which was defined as the median of the distribution of the distances between all one-to-one orthologs. Under the clock-like model, the points on a plot of intergenic distances versus intergenomic distances are expected to fit a straight line. A statistical technique to identify significant deviations from the clock-like behavior is described. For several hundred analyzed orthologous sets representing three well-defined bacterial lineages, the alpha-Proteobacteria, the gamma-Proteobacteria, and the Bacillus-Clostridium group, the clock-like null hypothesis could not be rejected for approximately 70% of the sets, whereas the rest showed substantial anomalies. Subsequent detailed phylogenetic analysis of the genes with the strongest deviations indicated that over one-half of these genes probably underwent a distinct form of horizontal gene transfer, xenologous gene displacement, in which a gene is displaced by an ortholog from a different lineage. The remaining deviations from the clock-like model could be explained by lineage-specific acceleration of evolution. The results indicate that although xenologous gene displacement is a major force in bacterial evolution, a significant majority of orthologous gene sets in three major bacterial lineages evolved in accordance with the clock-like model. The approach described here allows rapid detection of deviations from this mode of evolution on the genome scale.
- Pearson A, Budin M, Brocks JJ
- Phylogenetic and biochemical evidence for sterol synthesis in the bacterium Gemmata obscuriglobus.
- Proc Natl Acad Sci U S A. 2003; 100: 15352-7
- Display abstract
Sterol biosynthesis is viewed primarily as a eukaryotic process, and the frequency of its occurrence in bacteria has long been a subject of controversy. Two enzymes, squalene monooxygenase and oxidosqualene cyclase, are the minimum necessary for initial biosynthesis of sterols from squalene. In this work, 19 protein gene sequences for eukaryotic squalene monooxygenase and 12 protein gene sequences for eukaryotic oxidosqualene cyclase were compared with all available complete and partial prokaryotic genomes. The only unequivocal matches for a sterol biosynthetic pathway were in the proteobacterium, Methylococcus capsulatus, in which sterol biosynthesis is known, and in the planctomycete, Gemmata obscuriglobus. The latter species contains the most abbreviated sterol pathway yet identified in any organism. Analysis shows that the major sterols in Gemmata are lanosterol and its uncommon isomer, parkeol. There are no subsequent modifications of these products. In bacteria, the sterol biosynthesis genes occupy a contiguous coding region and possibly comprise a single operon. Phylogenetic trees constructed for both enzymes show that the sterol pathway in bacteria and eukaryotes has a common ancestry. It is likely that this contiguous reading frame was exchanged between bacteria and early eukaryotes via lateral gene transfer or endosymbiotic events. The primitive sterols produced by Gemmata suggest that this genus could retain the most ancient remnants of the sterol biosynthetic pathway.
- Teplyakov A et al.
- Crystal structure of the YchF protein reveals binding sites for GTP and nucleic acid.
- J Bacteriol. 2003; 185: 4031-7
- Display abstract
The bacterial protein encoded by the gene ychF is 1 of 11 universally conserved GTPases and the only one whose function is unknown. The crystal structure determination of YchF was sought to help with the functional assignment of the protein. The YchF protein from Haemophilus influenzae was cloned and expressed, and the crystal structure was determined at 2.4 A resolution. The polypeptide chain is folded into three domains. The N-terminal domain has a mononucleotide binding fold typical for the P-loop NTPases. An 80-residue domain next to it has a pronounced alpha-helical coiled coil. The C-terminal domain features a six-stranded half-barrel that curves around an alpha-helix. The crablike three-domain structure of YchF suggests the binding site for a double-stranded nucleic acid in the cleft between the domains. The structure of the putative GTP-binding site is consistent with the postulated guanine specificity of the protein. Fluorescence measurements have demonstrated the ability of YchF to bind a double-stranded nucleic acid and GTP. Taken together with other experimental data and genomic analysis, these results suggest that YchF may be part of a nucleoprotein complex and may function as a GTP-dependent translation factor.
- Patthy L
- Modular assembly of genes and the evolution of new functions.
- Genetica. 2003; 118: 217-31
- Display abstract
Modular assembly of novel genes from existing genes has long been thought to be an important source of evolutionary novelty. Thanks to major advances in genomic studies it has now become clear that this mechanism contributed significantly to the evolution of novel biological functions in different evolutionary lineages. Analyses of completely sequenced bacterial, archaeal and eukaryotic genomes has revealed that modular assembly of novel constituents of various eukaryotic intracellular signalling pathways played a major role in the evolution of eukaryotes. Comparison of the genomes of single-celled eukaryotes, multicellular plants and animals has also shown that the evolution of multicellularity was accompanied by the assembly of numerous novel extracellular matrix proteins and extracellular signalling proteins that are absolutely essential for multicellularity. There is now strong evidence that exon-shuffling played a general role in the assembly of the modular proteins involved in extracellular communications of metazoa. Although some of these proteins seem to be shared by all major groups of metazoa, others are restricted to certain evolutionary lineages. The genomic features of the chordates appear to have favoured intronic recombination as evidenced by the fact that exon-shuffling continued to be a major source of evolutionary novelty during vertebrate evolution.
- Koonin EV, Makarova KS, Rogozin IB, Davidovic L, Letellier MC, Pellegrini L
- The rhomboids: a nearly ubiquitous family of intramembrane serine proteases that probably evolved by multiple ancient horizontal gene transfers.
- Genome Biol. 2003; 4: 19-19
- Display abstract
BACKGROUND: The rhomboid family of polytopic membrane proteins shows a level of evolutionary conservation unique among membrane proteins. They are present in nearly all the sequenced genomes of archaea, bacteria and eukaryotes, with the exception of several species with small genomes. On the basis of experimental studies with the developmental regulator rhomboid from Drosophila and the AarA protein from the bacterium Providencia stuartii, the rhomboids are thought to be intramembrane serine proteases whose signaling function is conserved in eukaryotes and prokaryotes. RESULTS: Phylogenetic tree analysis carried out using several independent methods for tree constructions and the corresponding statistical tests suggests that, despite its broad distribution in all three superkingdoms, the rhomboid family was not present in the last universal common ancestor of extant life forms. Instead, we propose that rhomboids evolved in bacteria and have been acquired by archaea and eukaryotes through several independent horizontal gene transfers. In eukaryotes, two distinct, ancient acquisitions apparently gave rise to the two major subfamilies, typified by rhomboid and PARL (presenilins-associated rhomboid-like protein), respectively. Subsequent evolution of the rhomboid family in eukaryotes proceeded by multiple duplications and functional diversification through the addition of extra transmembrane helices and other domains in different orientations relative to the conserved core that harbors the protease activity. CONCLUSIONS: Although the near-universal presence of the rhomboid family in bacteria, archaea and eukaryotes appears to suggest that this protein is part of the heritage of the last universal common ancestor, phylogenetic tree analysis indicates a likely bacterial origin with subsequent dissemination by horizontal gene transfer. This emphasizes the importance of explicit phylogenetic analysis for the reconstruction of ancestral life forms. A hypothetical scenario for the origin of intracellular membrane proteases from membrane transporters is proposed.
- Tworowski D, Safro M
- The long-range electrostatic interactions control tRNA-aminoacyl-tRNA synthetase complex formation.
- Protein Sci. 2003; 12: 1247-51
- Display abstract
In most cases aminoacyl-tRNA synthetases (aaRSs) are negatively charged, as are the tRNA substrates. It is apparent that there are driving forces that provide a long-range attraction between like charge aaRS and tRNA, and ensure formation of "close encounters." Based on numerical solutions to the nonlinear Poisson-Boltzmann equation, we evaluated the electrostatic potential generated by different aaRSs. The 3D-isopotential surfaces calculated for different aaRSs at 0.01 kT/e contour level reveal the presence of large positive patches-one patch for each tRNA molecule. This is true for classes I and II monomers, dimers, and heterotetramers. The potential maps keep their characteristic features over a wide range of contour levels. The results suggest that nonspecific electrostatic interactions are the driving forces of primary stickiness of aaRSs-tRNA complexes. The long-range attraction in aaRS-tRNA systems is explained by capture of negatively charged tRNA into "blue space area" of the positive potential generated by aaRSs. Localization of tRNA in this area is a prerequisite for overcoming the barrier of Brownian motion.
- Harris JK, Kelley ST, Spiegelman GB, Pace NR
- The genetic core of the universal ancestor.
- Genome Res. 2003; 13: 407-12
- Display abstract
Molecular analysis of conserved sequences in the ribosomal RNAs of modern organisms reveals a three-domain phylogeny that converges in a universal ancestor for all life. We used the Clusters of Orthologous Groups database and information from published genomes to search for other universally conserved genes that have the same phylogenetic pattern as ribosomal RNA, and therefore constitute the ancestral genetic core of cells. Our analyses identified a small set of genes that can be traced back to the universal ancestor and have coevolved since that time. As indicated by earlier studies, almost all of these genes are involved with the transfer of genetic information, and most of them directly interact with the ribosome. Other universal genes have either undergone lateral transfer in the past, or have diverged so much in sequence that their distant past could not be resolved. The nature of the conserved genes suggests innovations that may have been essential to the divergence of the three domains of life. The analysis also identified several genes of unknown function with phylogenies that track with the ribosomal RNA genes. The products of these genes are likely to play fundamental roles in cellular processes.
- Ambrogelly A, Korencic D, Ibba M
- Functional annotation of class I lysyl-tRNA synthetase phylogeny indicates a limited role for gene transfer.
- J Bacteriol. 2002; 184: 4594-600
- Display abstract
Functional and comparative genomic studies have previously shown that the essential protein lysyl-tRNA synthetase (LysRS) exists in two unrelated forms. Most prokaryotes and all eukaryotes contain a class II LysRS, whereas most archaea and a few bacteria contain a less common class I LysRS. In bacteria the class I LysRS is only found in the alpha-proteobacteria and a scattering of other groups, including the spirochetes, while the class I protein is by far the most common form of LysRS in archaea. To investigate this unusual distribution we functionally annotated a representative phylogenetic sampling of LysRS proteins. Class I LysRS proteins from a variety of bacteria and archaea were characterized in vitro by their ability to recognize Escherichia coli tRNA(Lys) anticodon mutants. Class I LysRS proteins were found to fall into two distinct groups, those that preferentially recognize the third anticodon nucleotide of tRNA(Lys) (U36) and those that recognize both the second and third positions (U35 and U36). Strong recognition of U35 and U36 was confined to the pyrococcus-spirochete grouping within the archaeal branch of the class I LysRS phylogenetic tree, while U36 recognition was seen in other archaea and an example from the alpha-proteobacteria. Together with the corresponding phylogenetic relationships, these results suggest that despite its comparative rarity the distribution of class I LysRS conforms to the canonical archaeal-bacterial division. The only exception, suggested from both functional and phylogenetic data, appears to be the horizontal transfer of class I LysRS from a pyrococcal progenitor to a limited number of bacteria.
- Makarova KS, Aravind L, Grishin NV, Rogozin IB, Koonin EV
- A DNA repair system specific for thermophilic Archaea and bacteria predicted by genomic context analysis.
- Nucleic Acids Res. 2002; 30: 482-96
- Display abstract
During a systematic analysis of conserved gene context in prokaryotic genomes, a previously undetected, complex, partially conserved neighborhood consisting of more than 20 genes was discovered in most Archaea (with the exception of Thermoplasma acidophilum and Halobacterium NRC-1) and some bacteria, including the hyperthermophiles Thermotoga maritima and Aquifex aeolicus. The gene composition and gene order in this neighborhood vary greatly between species, but all versions have a stable, conserved core that consists of five genes. One of the core genes encodes a predicted DNA helicase, often fused to a predicted HD-superfamily hydrolase, and another encodes a RecB family exonuclease; three core genes remain uncharacterized, but one of these might encode a nuclease of a new family. Two more genes that belong to this neighborhood and are present in most of the genomes in which the neighborhood was detected encode, respectively, a predicted HD-superfamily hydrolase (possibly a nuclease) of a distinct family and a predicted, novel DNA polymerase. Another characteristic feature of this neighborhood is the expansion of a superfamily of paralogous, uncharacterized proteins, which are encoded by at least 20-30% of the genes in the neighborhood. The functional features of the proteins encoded in this neighborhood suggest that they comprise a previously undetected DNA repair system, which, to our knowledge, is the first repair system largely specific for thermophiles to be identified. This hypothetical repair system might be functionally analogous to the bacterial-eukaryotic system of translesion, mutagenic repair whose central components are DNA polymerases of the UmuC-DinB-Rad30-Rev1 superfamily, which typically are missing in thermophiles.
- Jenkins C et al.
- Genes for the cytoskeletal protein tubulin in the bacterial genus Prosthecobacter.
- Proc Natl Acad Sci U S A. 2002; 99: 17049-54
- Display abstract
Tubulins, the protein constituents of the microtubule cytoskeleton, are present in all known eukaryotes but have never been found in the Bacteria or Archaea. Here we report the presence of two tubulin-like genes [bacterial tubulin a (btuba) and bacterial tubulin b (btubb)] in bacteria of the genus Prosthecobacter (Division Verrucomicrobia). In this study, we investigated the organization and expression of these genes and conducted a comparative analysis of the bacterial and eukaryotic protein sequences, focusing on their phylogeny and 3D structures. The btuba and btubb genes are arranged as adjacent loci within the genome along with a kinesin light chain gene homolog. RT-PCR experiments indicate that these three genes are cotranscribed, and a probable promoter was identified upstream of btuba. On the basis of comparative modeling data, we predict that the Prosthecobacter tubulins are monomeric, unlike eukaryotic alpha and beta tubulins, which form dimers and are therefore unlikely to form microtubule-like structures. Phylogenetic analyses indicate that the Prosthecobacter tubulins are quite divergent and do not support recent horizontal transfer of the genes from a eukaryote. The discovery of genes for tubulin in a bacterial genus may offer new insights into the evolution of the cytoskeleton.
- Aravind L, Anantharaman V, Koonin EV
- Monophyly of class I aminoacyl tRNA synthetase, USPA, ETFP, photolyase, and PP-ATPase nucleotide-binding domains: implications for protein evolution in the RNA.
- Proteins. 2002; 48: 1-14
- Display abstract
Protein sequence and structure comparisons show that the catalytic domains of Class I aminoacyl-tRNA synthetases, a related family of nucleotidyltransferases involved primarily in coenzyme biosynthesis, nucleotide-binding domains related to the UspA protein (USPA domains), photolyases, electron transport flavoproteins, and PP-loop-containing ATPases together comprise a distinct class of alpha/beta domains designated the HUP domain after HIGH-signature proteins, UspA, and PP-ATPase. Several lines of evidence are presented to support the monophyly of the HUP domains, to the exclusion of other three-layered alpha/beta folds with the generic "Rossmann-like" topology. Cladistic analysis, with patterns of structural and sequence similarity used as discrete characters, identified three major evolutionary lineages within the HUP domain class: the PP-ATPases; the HIGH superfamily, which includes class I aaRS and related nucleotidyltransferases containing the HIGH signature in their nucleotide-binding loop; and a previously unrecognized USPA-like group, which includes USPA domains, electron transport flavoproteins, and photolyases. Examination of the patterns of phyletic distribution of distinct families within these three major lineages suggests that the Last Universal Common Ancestor of all modern life forms encoded 15-18 distinct alpha/beta ATPases and nucleotide-binding proteins of the HUP class. This points to an extensive radiation of HUP domains before the last universal common ancestor (LUCA), during which the multiple class I aminoacyl-tRNA synthetases emerged only at a late stage. Thus, substantial evolutionary diversification of protein domains occurred well before the modern version of the protein-dependent translation machinery was established, i.e., still in the RNA world.
- Nolling J et al.
- Genome sequence and comparative analysis of the solvent-producing bacterium Clostridium acetobutylicum.
- J Bacteriol. 2001; 183: 4823-38
- Display abstract
The genome sequence of the solvent-producing bacterium Clostridium acetobutylicum ATCC 824 has been determined by the shotgun approach. The genome consists of a 3.94-Mb chromosome and a 192-kb megaplasmid that contains the majority of genes responsible for solvent production. Comparison of C. acetobutylicum to Bacillus subtilis reveals significant local conservation of gene order, which has not been seen in comparisons of other genomes with similar, or, in some cases closer, phylogenetic proximity. This conservation allows the prediction of many previously undetected operons in both bacteria. However, the C. acetobutylicum genome also contains a significant number of predicted operons that are shared with distantly related bacteria and archaea but not with B. subtilis. Phylogenetic analysis is compatible with the dissemination of such operons by horizontal transfer. The enzymes of the solventogenesis pathway and of the cellulosome of C. acetobutylicum comprise a new set of metabolic capacities not previously represented in the collection of complete genomes. These enzymes show a complex pattern of evolutionary affinities, emphasizing the role of lateral gene exchange in the evolution of the unique metabolic profile of the bacterium. Many of the sporulation genes identified in B. subtilis are missing in C. acetobutylicum, which suggests major differences in the sporulation process. Thus, comparative analysis reveals both significant conservation of the genome organization and pronounced differences in many systems that reflect unique adaptive strategies of the two gram-positive bacteria.
- Sicheritz-Ponten T, Andersson SG
- A phylogenomic approach to microbial evolution.
- Nucleic Acids Res. 2001; 29: 545-52
- Display abstract
To study the origin and evolution of biochemical pathways in microorganisms, we have developed methods and software for automatic, large-scale reconstructions of phylogenetic relationships. We define the complete set of phylogenetic trees derived from the proteome of an organism as the phylome and introduce the term phylogenetic connection as a concept that describes the relative relationships between taxa in a tree. A query system has been incorporated into the system so as to allow searches for defined categories of trees within the phylome. As a complement, we have developed the pyphy system for visualising the results of complex queries on phylogenetic connections, genomic locations and functional assignments in a graphical format. Our phylogenomics approach, which links phylogenetic information to the flow of biochemical pathways within and among microbial species, has been used to examine more than 8000 phylogenetic trees from seven microbial genomes. The results have revealed a rich web of phylogenetic connections. However, the separation of Bacteria and Archaea into two separate domains remains robust.
- Ribas de Pouplana L, Brown JR, Schimmel P
- Structure-based phylogeny of class IIa tRNA synthetases in relation to an unusual biochemistry.
- J Mol Evol. 2001; 53: 261-8
- Display abstract
The available three-dimensional information for class II aminoacyl-tRNA synthetases has been used to generate sequence alignments that strictly adhere to the structural equivalencies between members of subclass IIa of these enzymes. The resulting alignments were used to study their phylogenetic relationships. In particular, the entire set of available sequences of prolyl-tRNA synthetases was analyzed in this way. In contrast to recent reports, we conclude that the evolutionary pattern of prolyl-tRNA synthetases does not obviously conform to the canonical phylogenetic distribution. The pattern found for these enzymes may be related to their biochemical characteristics. Our results indicate a potential relationship between the evolutionary pattern of prolyl-tRNA synthetases and the emergence of two enzymatically distinct forms of these proteins.
- Jordan IK, Makarova KS, Spouge JL, Wolf YI, Koonin EV
- Lineage-specific gene expansions in bacterial and archaeal genomes.
- Genome Res. 2001; 11: 555-65
- Display abstract
Gene duplication is an important mechanistic antecedent to the evolution of new genes and novel biochemical functions. In an attempt to assess the contribution of gene duplication to genome evolution in archaea and bacteria, clusters of related genes that appear to have expanded subsequent to the diversification of the major prokaryotic lineages (lineage-specific expansions) were analyzed. Analysis of 21 completely sequenced prokaryotic genomes shows that lineage-specific expansions comprise a substantial fraction (approximately 5%-33%) of their coding capacities. A positive correlation exists between the fraction of the genes taken up by lineage-specific expansions and the total number of genes in a genome. Consistent with the notion that lineage-specific expansions are made up of relatively recently duplicated genes, >90% of the detected clusters consists of only two to four genes. The more common smaller clusters tend to include genes with higher pairwise similarity (as reflected by average score density) than larger clusters. Regardless of size, cluster members tend to be located more closely on bacterial chromosomes than expected by chance, which could reflect a history of tandem gene duplication. In addition to the small clusters, almost all genomes also contain rare large clusters of size > or =20. Several examples of the potential adaptive significance of these large clusters are explored. The presence or absence of clusters and their related genes was used as the basis for the construction of a similarity graph for completely sequenced prokaryotic genomes. The topology of the resulting graph seems to reflect a combined effect of common ancestry, horizontal transfer, and lineage-specific gene loss.
- Wolf YI, Rogozin IB, Grishin NV, Tatusov RL, Koonin EV
- Genome trees constructed using five different approaches suggest new major bacterial clades.
- BMC Evol Biol. 2001; 1: 8-8
- Display abstract
BACKGROUND: The availability of multiple complete genome sequences from diverse taxa prompts the development of new phylogenetic approaches, which attempt to incorporate information derived from comparative analysis of complete gene sets or large subsets thereof. Such attempts are particularly relevant because of the major role of horizontal gene transfer and lineage-specific gene loss, at least in the evolution of prokaryotes. RESULTS: Five largely independent approaches were employed to construct trees for completely sequenced bacterial and archaeal genomes: i) presence-absence of genomes in clusters of orthologous genes; ii) conservation of local gene order (gene pairs) among prokaryotic genomes; iii) parameters of identity distribution for probable orthologs; iv) analysis of concatenated alignments of ribosomal proteins; v) comparison of trees constructed for multiple protein families. All constructed trees support the separation of the two primary prokaryotic domains, bacteria and archaea, as well as some terminal bifurcations within the bacterial and archaeal domains. Beyond these obvious groupings, the trees made with different methods appeared to differ substantially in terms of the relative contributions of phylogenetic relationships and similarities in gene repertoires caused by similar life styles and horizontal gene transfer to the tree topology. The trees based on presence-absence of genomes in orthologous clusters and the trees based on conserved gene pairs appear to be strongly affected by gene loss and horizontal gene transfer. The trees based on identity distributions for orthologs and particularly the tree made of concatenated ribosomal protein sequences seemed to carry a stronger phylogenetic signal. The latter tree supported three potential high-level bacterial clades,: i) Chlamydia-Spirochetes, ii) Thermotogales-Aquificales (bacterial hyperthermophiles), and ii) Actinomycetes-Deinococcales-Cyanobacteria. The latter group also appeared to join the low-GC Gram-positive bacteria at a deeper tree node. These new groupings of bacteria were supported by the analysis of alternative topologies in the concatenated ribosomal protein tree using the Kishino-Hasegawa test and by a census of the topologies of 132 individual groups of orthologous proteins. Additionally, the results of this analysis put into question the sister-group relationship between the two major archaeal groups, Euryarchaeota and Crenarchaeota, and suggest instead that Euryarchaeota might be a paraphyletic group with respect to Crenarchaeota. CONCLUSIONS: We conclude that, the extensive horizontal gene flow and lineage-specific gene loss notwithstanding, extension of phylogenetic analysis to the genome scale has the potential of uncovering deep evolutionary relationships between prokaryotic lineages.
- Anantharaman V, Koonin EV, Aravind L
- Regulatory potential, phyletic distribution and evolution of ancient, intracellular small-molecule-binding domains.
- J Mol Biol. 2001; 307: 1271-92
- Display abstract
Central cellular functions such as metabolism, solute transport and signal transduction are regulated, in part, via binding of small molecules by specialized domains. Using sensitive methods for sequence profile analysis and protein structure comparison, we exhaustively surveyed the protein sets from completely sequenced genomes for all occurrences of 21 intracellular small-molecule-binding domains (SMBDs) that are represented in at least two of the three major divisions of life (bacteria, archaea and eukaryotes). These included previously characterized domains such as PAS, GAF, ACT and ferredoxins, as well as three newly predicted SMBDs, namely the 4-vinyl reductase (4VR) domain, the NIFX domain and the 3-histidines (3H) domain. Although there are only a limited number of different superfamilies of these ancient SMBDs, they are present in numerous distinct proteins combined with various enzymatic, transport and signal-transducing domains. Most of the SMBDs show considerable evolutionary mobility and are involved in the generation of many lineage-specific domain architectures. Frequent re-invention of analogous architectures involving functionally related, but not homologous, domains was detected, such as, fusion of different SMBDs to several types of DNA-binding domains to form diverse transcription regulators in prokaryotes and eukaryotes. This is suggestive of similar selective forces affecting the diverse SMBDs and resulting in the formation of multidomain proteins that fit a limited number of functional stereotypes. Using the "guilt by association approach", the identification of SMBDs allowed prediction of functions and mode of regulation for a variety of previously uncharacterized proteins.
- McClure MA
- Evolution of the DUT gene: horizontal transfer between host and pathogen in all three domains of life.
- Curr Protein Pept Sci. 2001; 2: 313-24
- Display abstract
The ubiquity of the dut gene in Eukarya, Eubacteria, and Archaea implies its existence in the last common ancestor of the three domains of life. The dut gene exists as single, tandemly duplicated, and tandemly triplicated copies. The dUTPase is encoded as an auxiliary gene in the genomes of several DNA viruses and two distinct lineages of retroviruses. A comprehensive analysis of dUTPase amino acid sequence relationships explores the evolutionary dynamics of dut genes in viruses and their hosts. The data set was comprised of representative sequences from available Eukaryotes, Archaea, Eubacteria cells and viruses. A multiple alignment of these protein sequences was generated using a hidden Markov model (HMM) approach developed to align divergent data. Phylogenetic analysis revealed that horizontal transfer from hosts to virus genomes has occurred in all three domains of life. The evidence for horizontal transfers is particularly interesting in Eukaryotes as these dut genes have introns, while DNA virus dut genes do not. This implies an intermediary Retroid Agent facilitated the horizontal transfer process, via reverse transcription, between host mRNA and DNA viruses. The horizontal transfer of the dut gene from Eukaryotic, Eubacterial, and Archaeal organisms to both DNA and RNA viruses is the first documented case of host to pathogen transfer that has occurred in all three domains of life.
- Berthonneau E, Mirande M
- A gene fusion event in the evolution of aminoacyl-tRNA synthetases.
- FEBS Lett. 2000; 470: 300-4
- Display abstract
The genes of glutamyl- and prolyl-tRNA synthetases (GluRS and ProRS) are organized differently in the three kingdoms of the tree of life. In bacteria and archaea, distinct genes encode the two proteins. In several organisms from the eukaryotic phylum of coelomate metazoans, the two polypeptides are carried by a single polypeptide chain to form a bifunctional protein. The linker region is made of imperfectly repeated units also recovered as singular or plural elements connected as N-terminal or C-terminal polypeptide extensions in various eukaryotic aminoacyl-tRNA synthetases. Phylogenetic analysis points to the monophyletic origin of this polypeptide motif appended to six different members of the synthetase family, belonging to either of the two classes of aminoacyl-tRNA synthetases. In particular, the monospecific GluRS and ProRS from Caenorhabditis elegans, an acoelomate metazoan, exhibit this recurrent motif as a C-terminal or N-terminal appendage, respectively. Our analysis of the extant motifs suggests a possible series of events responsible for a gene fusion that gave rise to the bifunctional glutamyl-prolyl-tRNA synthetase through recombination between genomic sequences encoding the repeated units.
- Grishin NV, Wolf YI, Koonin EV
- From complete genomes to measures of substitution rate variability within and between proteins.
- Genome Res. 2000; 10: 991-1000
- Display abstract
Accumulation of complete genome sequences of diverse organisms creates new possibilities for evolutionary inferences from whole-genome comparisons. In the present study, we analyze the distributions of substitution rates among proteins encoded in 19 complete genomes (the interprotein rate distribution). To estimate these rates, it is necessary to employ another fundamental distribution, that of the substitution rates among sites in proteins (the intraprotein distribution). Using two independent approaches, we show that intraprotein substitution rate variability appears to be significantly greater than generally accepted. This yields more realistic estimates of evolutionary distances from amino-acid sequences, which is critical for evolutionary-tree construction. We demonstrate that the interprotein rate distributions inferred from the genome-to-genome comparisons are similar to each other and can be approximated by a single distribution with a long exponential shoulder. This suggests that a generalized version of the molecular clock hypothesis may be valid on genome scale. We also use the scaling parameter of the obtained interprotein rate distribution to construct a rooted whole-genome phylogeny. The topology of the resulting tree is largely compatible with those of global rRNA-based trees and trees produced by other approaches to genome-wide comparison.
- Subramanian G, Koonin EV, Aravind L
- Comparative genome analysis of the pathogenic spirochetes Borrelia burgdorferi and Treponema pallidum.
- Infect Immun. 2000; 68: 1633-48
- Display abstract
A comparative analysis of the predicted protein sequences encoded in the complete genomes of Borrelia burgdorferi and Treponema pallidum provides a number of insights into evolutionary trends and adaptive strategies of the two spirochetes. A measure of orthologous relationships between gene sets, termed the orthology coefficient (OC), was developed. The overall OC value for the gene sets of the two spirochetes is about 0.43, which means that less than one-half of the genes show readily detectable orthologous relationships. This emphasizes significant divergence between the two spirochetes, apparently driven by different biological niches. Different functional categories of proteins as well as different protein families show a broad distribution of OC values, from near 1 (a perfect, one-to-one correspondence) to near 0. The proteins involved in core biological functions, such as genome replication and expression, typically show high OC values. In contrast, marked variability is seen among proteins that are involved in specific processes, such as nutrient transport, metabolism, gene-specific transcription regulation, signal transduction, and host response. Differences in the gene complements encoded in the two spirochete genomes suggest active adaptive evolution for their distinct niches. Comparative analysis of the spirochete genomes produced evidence of gene exchanges with other bacteria, archaea, and eukaryotic hosts that seem to have occurred at different points in the evolution of the spirochetes. Examples are presented of the use of sequence profile analysis to predict proteins that are likely to play a role in pathogenesis, including secreted proteins that contain specific protein-protein interaction domains, such as von Willebrand A, YWTD, TPR, and PR1, some of which hitherto have been reported only in eukaryotes. We tentatively reconstruct the likely evolutionary process that has led to the divergence of the two spirochete lineages; this reconstruction seems to point to an ancestral state resembling the symbiotic spirochetes found in insect guts.
- Ke D et al.
- Evidence for horizontal gene transfer in evolution of elongation factor Tu in enterococci.
- J Bacteriol. 2000; 182: 6913-20
- Display abstract
The elongation factor Tu, encoded by tuf genes, is a GTP binding protein that plays a central role in protein synthesis. One to three tuf genes per genome are present, depending on the bacterial species. Most low-G+C-content gram-positive bacteria carry only one tuf gene. We have designed degenerate PCR primers derived from consensus sequences of the tuf gene to amplify partial tuf sequences from 17 enterococcal species and other phylogenetically related species. The amplified DNA fragments were sequenced either by direct sequencing or by sequencing cloned inserts containing putative amplicons. Two different tuf genes (tufA and tufB) were found in 11 enterococcal species, including Enterococcus avium, Enterococcus casseliflavus, Enterococcus dispar, Enterococcus durans, Enterococcus faecium, Enterococcus gallinarum, Enterococcus hirae, Enterococcus malodoratus, Enterococcus mundtii, Enterococcus pseudoavium, and Enterococcus raffinosus. For the other six enterococcal species (Enterococcus cecorum, Enterococcus columbae, Enterococcus faecalis, Enterococcus sulfureus, Enterococcus saccharolyticus, and Enterococcus solitarius), only the tufA gene was present. Based on 16S rRNA gene sequence analysis, the 11 species having two tuf genes all have a common ancestor, while the six species having only one copy diverged from the enterococcal lineage before that common ancestor. The presence of one or two copies of the tuf gene in enterococci was confirmed by Southern hybridization. Phylogenetic analysis of tuf sequences demonstrated that the enterococcal tufA gene branches with the Bacillus, Listeria, and Staphylococcus genera, while the enterococcal tufB gene clusters with the genera Streptococcus and Lactococcus. Primary structure analysis showed that four amino acid residues encoded within the sequenced regions are conserved and unique to the enterococcal tufB genes and the tuf genes of streptococci and Lactococcus lactis. The data suggest that an ancestral streptococcus or a streptococcus-related species may have horizontally transferred a tuf gene to the common ancestor of the 11 enterococcal species which now carry two tuf genes.
- Koretke KK, Lupas AN, Warren PV, Rosenberg M, Brown JR
- Evolution of two-component signal transduction.
- Mol Biol Evol. 2000; 17: 1956-70
- Display abstract
Two-component signal transduction (TCST) systems are the principal means for coordinating responses to environmental changes in bacteria as well as some plants, fungi, protozoa, and archaea. These systems typically consist of a receptor histidine kinase, which reacts to an extracellular signal by phosphorylating a cytoplasmic response regulator, causing a change in cellular behavior. Although several model systems, including sporulation and chemotaxis, have been extensively studied, the evolutionary relationships between specific TCST systems are not well understood, and the ancestry of the signal transduction components is unclear. Phylogenetic trees of TCST components from 14 complete and 6 partial genomes, containing 183 histidine kinases and 220 response regulators, were constructed using distance methods. The trees showed extensive congruence in the positions of 11 recognizable phylogenetic clusters. Eukaryotic sequences were found almost exclusively in one cluster, which also showed the greatest extent of domain variability in its component proteins, and archaeal sequences mainly formed species-specific clusters. Three clusters in different parts of the kinase tree contained proteins with serine-phosphorylating activity. All kinases were found to be monophyletic with respect to other members of their superfamily, such as type II topoisomerases and Hsp90. Structural analysis further revealed significant similarity to the ATP-binding domain of eukaryotic protein kinases. TCST systems are of bacterial origin and radiated into archaea and eukaryotes by lateral gene transfer. Their components show extensive coevolution, suggesting that recombination has not been a major factor in their differentiation. Although histidine kinase activity is prevalent, serine kinases have evolved multiple times independently within this family, accompanied by a loss of the cognate response regulator(s). The structural and functional similarity between TCST kinases and eukaryotic protein kinases raises the possibility of a distant evolutionary relationship.
- Natale DA, Shankavaram UT, Galperin MY, Wolf YI, Aravind L, Koonin EV
- Towards understanding the first genome sequence of a crenarchaeon by genome annotation using clusters of orthologous groups of proteins (COGs).
- Genome Biol. 2000; 1: 9-9
- Display abstract
BACKGROUND: Standard archival sequence databases have not been designed as tools for genome annotation and are far from being optimal for this purpose. We used the database of Clusters of Orthologous Groups of proteins (COGs) to reannotate the genomes of two archaea, Aeropyrum pernix, the first member of the Crenarchaea to be sequenced, and Pyrococcus abyssi. RESULTS: A. pernix and P. abyssi proteins were assigned to COGs using the COGNITOR program; the results were verified on a case-by-case basis and augmented by additional database searches using the PSI-BLAST and TBLASTN programs. Functions were predicted for over 300 proteins from A. pernix, which could not be assigned a function using conventional methods with a conservative sequence similarity threshold, an approximately 50% increase compared to the original annotation. A. pernix shares most of the conserved core of proteins that were previously identified in the Euryarchaeota. Cluster analysis or distance matrix tree construction based on the co-occurrence of genomes in COGs showed that A. pernix forms a distinct group within the archaea, although grouping with the two species of Pyrococci, indicative of similar repertoires of conserved genes, was observed. No indication of a specific relationship between Crenarchaeota and eukaryotes was obtained in these analyses. Several proteins that are conserved in Euryarchaeota and most bacteria are unexpectedly missing in A. pernix, including the entire set of de novo purine biosynthesis enzymes, the GTPase FtsZ (a key component of the bacterial and euryarchaeal cell-division machinery), and the tRNA-specific pseudouridine synthase, previously considered universal. A. pernix is represented in 48 COGs that do not contain any euryarchaeal members. Many of these proteins are TCA cycle and electron transport chain enzymes, reflecting the aerobic lifestyle of A. pernix. CONCLUSIONS: Special-purpose databases organized on the basis of phylogenetic analysis and carefully curated with respect to known and predicted protein functions provide for a significant improvement in genome annotation. A differential genome display approach helps in a systematic investigation of common and distinct features of gene repertoires and in some cases reveals unexpected connections that may be indicative of functional similarities between phylogenetically distant organisms and of lateral gene exchange.
- Woese CR
- Interpreting the universal phylogenetic tree.
- Proc Natl Acad Sci U S A. 2000; 97: 8392-6
- Display abstract
The universal phylogenetic tree not only spans all extant life, but its root and earliest branchings represent stages in the evolutionary process before modern cell types had come into being. The evolution of the cell is an interplay between vertically derived and horizontally acquired variation. Primitive cellular entities were necessarily simpler and more modular in design than are modern cells. Consequently, horizontal gene transfer early on was pervasive, dominating the evolutionary dynamic. The root of the universal phylogenetic tree represents the first stage in cellular evolution when the evolving cell became sufficiently integrated and stable to the erosive effects of horizontal gene transfer that true organismal lineages could exist.
- Horner DS, Hirt RP, Embley TM
- A single eubacterial origin of eukaryotic pyruvate: ferredoxin oxidoreductase genes: implications for the evolution of anaerobic eukaryotes.
- Mol Biol Evol. 1999; 16: 1280-91
- Display abstract
The iron sulfur protein pyruvate: ferredoxin oxidoreductase (PFO) is central to energy metabolism in amitochondriate eukaryotes, including those with hydrogenosomes. Thus, revealing the evolutionary history of PFO is critical to understanding the origin(s) of eukaryote anaerobic energy metabolism. We determined a complete PFO sequence for Spironucleus barkhanus, a large fragment of a PFO sequence from Clostridium pasteurianum, and a fragment of a new PFO from Giardia lamblia. Phylogenetic analyses of eubacterial and eukaryotic PFO genes suggest a complex history for PFO, including possible gene duplications and horizontal transfers among eubacteria. Our analyses favor a common origin for eukaryotic cytosolic and hydrogenosomal PFOs from a single eubacterial source, rather than from separate horizontal transfers as previously suggested. However, with the present sampling of genes and species, we were unable to infer a specific eubacterial sister group for eukaryotic PFO. Thus, we find no direct support for the published hypothesis that the donor of eukaryote PFO was the common alpha-proteobacterial ancestor of mitochondria and hydrogenosomes. We also report that several fungi and protists encode proteins with PFO domains that are likely monophyletic with PFOs from anaerobic protists. In Saccharomyces cerevisiae, PFO domains combine with fragments of other redox proteins to form fusion proteins which participate in methionine biosynthesis. Our results are consistent with the view that PFO, an enzyme previously considered to be specific to energy metabolism in amitochondriate protists, was present in the common ancestor of contemporary eukaryotes and was retained, wholly or in part, during the evolution of oxygen-dependent and mitochondrion-bearing lineages.
- Schimmel P, Ribas de Pouplana L
- Genetic code origins: experiments confirm phylogenetic predictions and may explain a puzzle.
- Proc Natl Acad Sci U S A. 1999; 96: 327-8
- Brown JR, Doolittle WF
- Gene descent, duplication, and horizontal transfer in the evolution of glutamyl- and glutaminyl-tRNA synthetases.
- J Mol Evol. 1999; 49: 485-95
- Display abstract
In translation, separate aminoacyl-tRNA synthetases attach the 20 different amino acids to their cognate tRNAs, with the exception of glutamine. Eukaryotes and some bacteria employ a specific glutaminyl-tRNA synthetase (GlnRS) which other Bacteria, the Archaea (archaebacteria), and organelles apparently lack. Instead, tRNA(Gln) is initially acylated with glutamate by glutamyl-tRNA synthetase (GluRS), then the glutamate moiety is transamidated to glutamine. Lamour et al. [(1994) Proc Natl Acad Sci USA 91:8670-8674] suggested that an early duplication of the GluRS gene in eukaryotes gave rise to the gene for GlnRS-a copy of which was subsequently transferred to proteobacteria. However, questions remain about the occurrence of GlnRS genes among the Eucarya (eukaryotes) outside of the "crown" taxa (animals, fungi, and plants), the distribution of GlnRS genes in the Bacteria, and their evolutionary relationships to genes from the Archaea. Here, we show that GlnRS occurs in the most deeply branching eukaryotes and that putative GluRS genes from the Archaea are more closely related to GlnRS and GluRS genes of the Eucarya than to those of Bacteria. There is still no evidence for the existence of GlnRS in the Archaea. We propose that the last common ancestor to contemporary cells, or cenancestor, used transamidation to synthesize Gln-tRNA(Gln) and that both the Bacteria and the Archaea retained this pathway, while eukaryotes developed a specific GlnRS gene through the duplication of an existing GluRS gene. In the Bacteria, GlnRS genes have been identified in a total of 10 species from three highly diverse taxonomic groups: Thermus/Deinococcus, Proteobacteria gamma/beta subdivision, and Bacteroides/Cytophaga/Flexibacter. Although all bacterial GlnRS form a monophyletic group, the broad phyletic distribution of this tRNA synthetase suggests that multiple gene transfers from eukaryotes to bacteria occurred shortly after the Archaea-eukaryote divergence.
- Ponting CP, Aravind L, Schultz J, Bork P, Koonin EV
- Eukaryotic signalling domain homologues in archaea and bacteria. Ancient ancestry and horizontal gene transfer.
- J Mol Biol. 1999; 289: 729-45
- Display abstract
Phyletic distributions of eukaryotic signalling domains were studied using recently developed sensitive methods for protein sequence analysis, with an emphasis on the detection and accurate enumeration of homologues in bacteria and archaea. A major difference was found between the distributions of enzyme families that are typically found in all three divisions of cellular life and non-enzymatic domain families that are usually eukaryote-specific. Previously undetected bacterial homologues were identified for# plant pathogenesis-related proteins, Pad1, von Willebrand factor type A, src homology 3 and YWTD repeat-containing domains. Comparisons of the domain distributions in eukaryotes and prokaryotes enabled distinctions to be made between the domains originating prior to the last common ancestor of all known life forms and those apparently originating as consequences of horizontal gene transfer events. A number of transfers of signalling domains from eukaryotes to bacteria were confidently identified, in contrast to only a single case of apparent transfer from eukaryotes to archaea.
- Norcum MT, Dignam JD
- Immunoelectron microscopic localization of glutamyl-/ prolyl-tRNA synthetase within the eukaryotic multisynthetase complex.
- J Biol Chem. 1999; 274: 12205-8
- Display abstract
A high molecular mass complex of aminoacyl-tRNA synthetases is readily isolated from a variety of eukaryotes. Although its composition is well characterized, knowledge of its structure and organization is still quite limited. This study uses antibodies directed against prolyl-tRNA synthetase for immunoelectron microscopic localization of the bifunctional glutamyl-/prolyl-tRNA synthetase. This is the first visualization of a specific site within the multisynthetase complex. Images of immunocomplexes are presented in the characteristic views of negatively stained multisynthetase complex from rabbit reticulocytes. As described in terms of a three domain working model of the structure, in "front" views of the particle and "intermediate" views, the primary antibody binding site is near the intersection between the "base" and one "arm." In "side" views, where the particle is rotated about its long axis, the binding site is near the midpoint. "Top" and "bottom" views, which appear as square projections, are also consistent with the central location of the binding site. These data place the glutamyl-/prolyl-tRNA synthetase polypeptide in a defined area of the particle, which encompasses portions of two domains, yet is consistent with the previous structural model.
- Gray MW, Burger G, Lang BF
- Mitochondrial evolution.
- Science. 1999; 283: 1476-81
- Display abstract
The serial endosymbiosis theory is a favored model for explaining the origin of mitochondria, a defining event in the evolution of eukaryotic cells. As usually described, this theory posits that mitochondria are the direct descendants of a bacterial endosymbiont that became established at an early stage in a nucleus-containing (but amitochondriate) host cell. Gene sequence data strongly support a monophyletic origin of the mitochondrion from a eubacterial ancestor shared with a subgroup of the alpha-Proteobacteria. However, recent studies of unicellular eukaryotes (protists), some of them little known, have provided insights that challenge the traditional serial endosymbiosis-based view of how the eukaryotic cell and its mitochondrion came to be. These data indicate that the mitochondrion arose in a common ancestor of all extant eukaryotes and raise the possibility that this organelle originated at essentially the same time as the nuclear component of the eukaryotic cell rather than in a separate, subsequent event.
- Yang D, Kusser I, Kopke AK, Koop BF, Matheson AT
- The structure and evolution of the ribosomal proteins encoded in the spc operon of the archaeon (Crenarchaeota) Sulfolobus acidocaldarius.
- Mol Phylogenet Evol. 1999; 12: 177-85
- Display abstract
The genes for nine ribosomal proteins, L24, L5, S14, S8, L6, L18, S5, L30, and L15, have been isolated and sequenced from the spc operon in the archaeon (Crenarchaeota) Sulfolobus acidocaldarius, and the putative amino acid sequence of the proteins coded by these genes has been determined. In addition, three other genes in the spc operon, coding for ribosomal proteins S4E, L32E, and L19E (equivalent to rat ribosomal proteins S4, L32, and L19), were sequenced and the structure of the putative proteins was determined. The order of the ribosomal protein genes in the spc operon of the Crenarchaeota kingdom of Archaea is identical to that present in the Euryarchaeota kingdom of Archaea and also identical to that found in bacteria, except for the genes for r-proteins S4E, L32E, and L19E, which are absent in bacteria. Although AUG is the initiation codon in most of the spc genes, GUG (val) and UUG (leu) are also used as initiation codons in S. acidocaldarius. Over 70% of the codons in the Sulfolobus spc operon have A or U in the third position, reflecting the low GC content of Sulfolobus DNA. Phylogenetic analysis indicated that the archaeal r-proteins are a sister group of their eucaryotic counterparts but did not resolve the question of whether the Archaea is monophyletic, as suggested by the L6P, L15P, and L18P trees, or the question of whether the Crenarchaeota is separate from the Euryarchaeota and closer to the Eucarya, as suggested by the S8P, S5P, and L24P trees. In the case of the three Sulfolobus r-proteins that do not have a counterpart in the bacterial ribosome (S4E, L32E, and L19E), the archaeal r-proteins showed substantial identity to their eucaryotic equivalents, but in all cases the archaeal proteins formed a separate group from the eucaryotic proteins.
- Brinkmann H, Philippe H
- Archaea sister group of Bacteria? Indications from tree reconstruction artifacts in ancient phylogenies.
- Mol Biol Evol. 1999; 16: 817-25
- Display abstract
The 54-kDa signal recognition particle and the receptor SR alpha, two proteins involved in the cotranslational translocation of proteins, are paralogs. They originate from a gene duplication that occurred prior to the last universal common ancestor, allowing one to root the universal tree of life. Phylogenetic analysis using standard methods supports the generally accepted cluster of Archaea and Eucarya. However, a new method increasing the signal-to-noise ratio strongly suggests that this result is due to a long-branch attraction artifact, with the Bacteria evolving fastest. In fact, the Archaea/Eucarya sisterhood is recovered only by the fast-evolving positions. In contrast, the most slowly evolving positions, which are the most likely to retain the ancient phylogenetic signal, support the monophyly of prokaryotes. Such a eukaryotic rooting provides a simple explanation for the high similarity of Archaea and Bacteria observed in complete-genome analysis, and should prompt a reconsideration of current views on the origin of eukaryotes.
- Sankaranarayanan R et al.
- The structure of threonyl-tRNA synthetase-tRNA(Thr) complex enlightens its repressor activity and reveals an essential zinc ion in the active site.
- Cell. 1999; 97: 371-81
- Display abstract
E. coli threonyl-tRNA synthetase (ThrRS) is a class II enzyme that represses the translation of its own mRNA. We report the crystal structure at 2.9 A resolution of the complex between tRNA(Thr) and ThrRS, whose structural features reveal novel strategies for providing specificity in tRNA selection. These include an amino-terminal domain containing a novel protein fold that makes minor groove contacts with the tRNA acceptor stem. The enzyme induces a large deformation of the anticodon loop, resulting in an interaction between two adjacent anticodon bases, which accounts for their prominent role in tRNA identity and translational regulation. A zinc ion found in the active site is implicated in amino acid recognition/discrimination.
- Baldo AM, McClure MA
- Evolution and horizontal transfer of dUTPase-encoding genes in viruses and their hosts.
- J Virol. 1999; 73: 7710-21
- Display abstract
dUTPase is a ubiquitous and essential enzyme responsible for regulating cellular levels of dUTP. The dut gene exists as single, tandemly duplicated, and tandemly triplicated copies. Crystallized single-copy dUTPases have been shown to assemble as homotrimers. dUTPase is encoded as an auxiliary gene in a number of virus genomes. The origin of viral dut genes has remained unresolved since their initial discovery. A comprehensive analysis of dUTPase amino acid sequence relationships was performed to explore the evolutionary dynamics of dut in viruses and their hosts. Our data set, comprised of 24 host and 51 viral sequences, includes representative sequences from available eukaryotes, archaea, eubacteria cells, and viruses, including herpesviruses. These amino acid sequences were aligned by using a hidden Markov model approach developed to align divergent data. Known secondary structures from single-copy crystals were mapped onto the aligned duplicate and triplicate sequences. We show how duplicated dUTPases might fold into a monomer, and we hypothesize that triplicated dUTPases also assemble as monomers. Phylogenetic analysis revealed at least five viral dUTPase sequence lineages in well-supported monophyletic clusters with eukaryotic, eubacterial, and archaeal hosts. We have identified all five as strong examples of horizontal transfer as well as additional potential transfer of dut genes among eubacteria, between eubacteria and viruses, and between retroviruses. The evidence for horizontal transfers is particularly interesting since eukaryotic dut genes have introns, while DNA virus dut genes do not. This implies that an intermediary retroid agent facilitated the horizontal transfer process between host mRNA and DNA viruses.
- Makarova KS et al.
- Comparative genomics of the Archaea (Euryarchaeota): evolution of conserved protein families, the stable core, and the variable shell.
- Genome Res. 1999; 9: 608-28
- Display abstract
Comparative analysis of the protein sequences encoded in the four euryarchaeal species whose genomes have been sequenced completely (Methanococcus jannaschii, Methanobacterium thermoautotrophicum, Archaeoglobus fulgidus, and Pyrococcus horikoshii) revealed 1326 orthologous sets, of which 543 are represented in all four species. The proteins that belong to these conserved euryarchaeal families comprise 31%-35% of the gene complement and may be considered the evolutionarily stable core of the archaeal genomes. The core gene set includes the great majority of genes coding for proteins involved in genome replication and expression, but only a relatively small subset of metabolic functions. For many gene families that are conserved in all euryarchaea, previously undetected orthologs in bacteria and eukaryotes were identified. A number of euryarchaeal synapomorphies (unique shared characters) were identified; these are protein families that possess sequence signatures or domain architectures that are conserved in all euryarchaea but are not found in bacteria or eukaryotes. In addition, euryarchaea-specific expansions of several protein and domain families were detected. In terms of their apparent phylogenetic affinities, the archaeal protein families split into bacterial and eukaryotic families. The majority of the proteins that have only eukaryotic orthologs or show the greatest similarity to their eukaryotic counterparts belong to the core set. The families of euryarchaeal genes that are conserved in only two or three species constitute a relatively mobile component of the genomes whose evolution should have involved multiple events of lineage-specific gene loss and horizontal gene transfer. Frequently these proteins have detectable orthologs only in bacteria or show the greatest similarity to the bacterial homologs, which might suggest a significant role of horizontal gene transfer from bacteria in the evolution of the euryarchaeota.
- Knox EB
- The use of hierarchies as organizational models in systematics
- Biol J Linn Soc Lond. 1998; 63: 1-49
- Display abstract
A hierarchy is an abstract organizational model of inter-level relationships among entities. When isomorphic with nature, hierarchies are useful for organizing and manipulating our knowledge. Hierarchies have been used in biological systematics to represent several distinct, but interrelated, facets of the evolution of life with different organizational properties, and these distinctions have been confused by the rubric «the hierarchy of life». Evolution, as descent with modification, is inherently dualistic. The organizational structure of a hierarchy can be used to represent dualistic properties as inter-level relationships. Cladistics is monistic, with a singular focus on patterns of descent. Descent has conceptual priority over modification, but the organizational relationship is not exclusive. «Cladistic classification» is an oxymoron because cladistics lacks the class concepts needed to construct a classification, a point recognized by those who suggest abandoning Linnaean classification in favour of a newly devised monophyletic systematization. Cladistic analysis of descent can be supplemented with an analysis of modification that provides the class concepts needed to construct an evolutionary/phylogenetic classification. When a strong monophyletic pattern of modification is detected (in addition to its monophyletic pattern of descent), the criterion of subsequent modification provides the basis for formally recognizing a certain monophyletic group at a given rank, as opposed to a group that is one node more inclusive or one node less. The criterion of subsequent modification also permits detection of strong paraphyletic patterns of modification, when they exist. By setting standards of evidence needed to recognize paraphyletic groups, one concomitantly strengthens the basis for formally recognizing selective monophyletic groups.Copyright 1998 The Linnean Society of London
- Cusack S, Yaremchuk A, Krikliviy I, Tukalo M
- tRNA(Pro) anticodon recognition by Thermus thermophilus prolyl-tRNA synthetase.
- Structure. 1998; 6: 101-8
- Display abstract
BACKGROUND: Most aminoacyl-tRNA synthetases (aaRSs) specifically recognize all or part of the anticodon triplet of nucleotides of their cognate tRNAs. Class IIa and class IIb aaRSs possess structurally distinct tRNA anticodon-binding domains. The class IIb enzymes (LysRS, AspRS and AsnRS) have an N-terminal beta-barrel domain (OB-fold); the interactions of this domain with the anticodon stem-loop are structurally well characterised for AspRS and LysRS. Four out of five class IIa enzymes (ProRS, ThrRS, HisRS and GlyRS, but not SerRS) have a C-terminal anticodon-binding domain with an alpha/beta fold, not yet found in any other protein. The mode of RNA binding by this domain is hitherto unknown as is the rationale, if any, behind classification of anticodon-binding domains for different aaRSs. RESULTS: The crystal structure of Thermus thermophilus prolyl-tRNA synthetase (ProRSTT) in complex with tRNA(Pro) has been determined at 3.5 A resolution by molecular replacement using the native enzyme structure. One tRNA molecule, of which only the lower two-thirds is well ordered, is found bound to the synthetase dimer. The C-terminal anticodon-binding domain binds to the anticodon stem-loop from the major groove side. Binding to tRNA by ProRSTT is reminiscent of the interaction of class IIb enzymes with cognate tRNAs, but only three of the anticodon-loop bases become splayed out (bases 35-37) rather than five (bases 33-37) in the case of class IIb enzymes. The two anticodon bases conserved in all tRNA(Pro), G35 and G36, are specifically recognised by ProRSTT. CONCLUSIONS: For the synthetases possessing the class IIa anticodon-binding domain (ProRS, ThrRS and GlyRS, with the exception of HisRS), the two anticodon bases 35 and 36 are sufficient to uniquely identify the cognate tRNA (GG for proline, GU for threonine, CC for glycine), because these amino acids occupy full codon groups. The structure of ProRSTT in complex with its cognate tRNA shows that these two bases specifically interact with the enzyme, whereas base 34, which can be any base, is stacked under base 33 and makes no interactions with the synthetase. This is in agreement with biochemical experiments which identify bases 35 and 36 as major tRNA identity elements. In contrast, class IIb synthetases (AspRS, AsnRS and LysRS) have a distinct anticodon-binding domain that specifically recognises all three anticodon bases. This again correlates with the requirements of the genetic code for cognate tRNA identification, as the class IIb amino acids occupy half codon groups.
- Brown JR, Zhang J, Hodgson JE
- A bacterial antibiotic resistance gene with eukaryotic origins.
- Curr Biol. 1998; 8: 3657-3657
- Karlin S, Brocchieri L
- Heat shock protein 70 family: multiple sequence comparisons, function, and evolution.
- J Mol Evol. 1998; 47: 565-77
- Display abstract
The heat shock protein 70 kDa sequences (HSP70) are of great importance as molecular chaperones in protein folding and transport. They are abundant under conditions of cellular stress. They are highly conserved in all domains of life: Archaea, eubacteria, eukaryotes, and organelles (mitochondria, chloroplasts). A multiple alignment of a large collection of these sequences was obtained employing our symmetric-iterative ITERALIGN program (Brocchieri and Karlin 1998). Assessments of conservation are interpreted in evolutionary terms and with respect to functional implications. Many archaeal sequences (methanogens and halophiles) tend to align best with the Gram-positive sequences. These two groups also miss a signature segment [about 25 amino acids (aa) long] present in all other HSP70 species (Gupta and Golding 1993). We observed a second signature sequence of about 4 aa absent from all eukaryotic homologues, significantly aligned in all prokaryotic sequences. Consensus sequences were developed for eight groups [Archaea, Gram-positive, proteobacterial Gram-negative, singular bacteria, mitochondria, plastids, eukaryotic endoplasmic reticulum (ER) isoforms, eukaryotic cytoplasmic isoforms]. All group consensus comparisons tend to summarize better the alignments than do the individual sequence comparisons. The global individual consensus "matches" 87% with the consensus of consensuses sequence. A functional analysis of the global consensus identifies a (new) highly significant mixed charge cluster proximal to the carboxyl terminus of the sequence highlighting the hypercharge run EEDKKRRER (one-letter aa code used). The individual Archaea and Gram-positive sequences contain a corresponding significant mixed charge cluster in the location of the charge cluster of the consensus sequence. In contrast, the four Gram-negative proteobacterial sequences of the alignment do not have a charge cluster (even at the 5% significance level). All eukaryotic HSP70 sequences have the analogous charge cluster. Strikingly, several of the eukaryotic isoforms show multiple mixed charged clusters. These clusters were interpreted with supporting data related to HSP70 activity in facilitating chaperone, transport, and secretion function. We observed that the consensus contains only a single tryptophan residue and a single conserved cysteine. This is interpreted with respect to the target rule for disaggregating misfolded proteins. The mitochondrial HSP70 connections to bacterial HSP70 are analyzed, suggesting a polyphyletic split of Trypanosoma and Leishmania protist mitochondrial (Mt) homologues separated from Mt-animal/fungal/plant homologues. Moreover, the HSP70 sequences from the amitochondrial Entamoeba histolytica and Trichomonas vaginalis species were analyzed. The E. histolytica HSP70 is most similar to the higher eukaryotic cytoplasmic sequences, with significantly weaker alignments to ER sequences and much diminished matching to all eubacterial, mitochondrial, and chloroplast sequences. This appears to be at variance with the hypothesis that E. histolytica rather recently lost its mitochondrial organelle. T. vaginalis contains two HSP70 sequences, one Mt-like and the second similar to eukaryotic cytoplasmic sequences suggesting two diverse origins.
- Shiba K, Motegi H, Yoshida M, Noda T
- Human asparaginyl-tRNA synthetase: molecular cloning and the inference of the evolutionary history of Asx-tRNA synthetase family.
- Nucleic Acids Res. 1998; 26: 5045-51
- Display abstract
We have cloned and sequenced a cDNA encoding human cytoplasmic asparaginyl-tRNA synthetase (AsnRS). The N-terminal appended domain of 112 amino acid represents the signature sequence for the eukaryotic AsnRS and is absent from archaebacterial or eubacterial enzymes. The canonical ortholog for AsnRS is absent from most archaebacterial and some eubacterial genomes, indicating that in those organisms, formation of asparaginyl-tRNA is independent of the enzyme. The high degree of sequence conservation among asparaginyl- and aspartyl-tRNA synthetases (AsxRS) made it possible to infer the evolutionary paths of the two enzymes. The data show the neighbor relationship between AsnRS and eubacterial aspartyl-tRNA synthetase, and support the occurrence of AsnRS early in the course of evolution, which is in contrast to the proposed late occurrence of glutaminyl-tRNA synthetase.
- Budin K, Philippe H
- New insights into the phylogeny of eukaryotes based on ciliate Hsp70 sequences.
- Mol Biol Evol. 1998; 15: 943-56
- Display abstract
The current framework of the eukaryotic phylogeny is based on the analysis of a comprehensive set of sequences of the small subunit ribosomal RNA. However, phylogenies based on protein-encoding genes are not completely congruent with this picture. Since congruence between different markers is the best tool to determine evolutionary history, we focused on Hsp70 (heat-shock protein of 70 kDa), a chaperone protein which is highly conserved and is a potentially reliable phylogenetic marker. We used a PCR-based approach to sequence Hsp70s in two distinct classes of Ciliates. Seven Hsp70s were identified from Paramecium tetraurelia (Oligohymenophora) and six Hsp70s from Euplotes aediculatus (Hypotricha), encompassing orthologous genes for all major Hsp70 classes of Eukaryotes, i.e., those localized in cytosol, in endoplasmic reticulum, and in mitochondria. Three independent phylogenies of eukaryotes, based on each set of orthologous genes, have been constructed using different tree reconstruction methods. A significant advantage of Hsp70s is the existence of outgroups close to Eukaryotes for these major classes, reducing the long-branch attraction artifact due to the outgroup. The monophyly of Ciliates is supported by good bootstrap proportions in the phylogenetic reconstructions, and this phylum is generally a sister-group of Sporozoa, forming the expected Alveolates clade. The Hsp70 seems to be a suitable phylogenetic marker since it recovers all the monophyletic groups, undoubtedly defined by morphological criteria. The Hsp70 trees are, however, notably different from the rRNA ones and do not show two aspects of the classical topology, i.e., the successive emergence of deeply branching groups and the vast assembly of the major eukaryotic groups, emerging at the tip of the tree, i.e., the "terminal crown". More precisely, the Hsp70 trees do not resolve the relationships between the major groups of Eukaryotes with confidence, in keeping with the hypothesis that all these groups emerged in a great radiation that occurred at the origin of all the extant Eukaryotes.
- Siatecka M, Rozek M, Barciszewski J, Mirande M
- Modular evolution of the Glx-tRNA synthetase family--rooting of the evolutionary tree between the bacteria and archaea/eukarya branches.
- Eur J Biochem. 1998; 256: 80-7
- Display abstract
The accuracy of protein biosynthesis generally rests on a family of 20 aminoacyl-tRNA synthetases, one for each amino acid. In bacteria, archaea and eukaryotic organelles, the formation of Gln-tRNA(Gln) is prevalently accomplished by a transamidation pathway, aminoacylation of tRNA(Gln) with Glu by glutamyl-tRNA synthetase (GluRS) followed by a tRNA-dependent transamidation of Glu from Glu-tRNA(Gln). A few bacterial species, such as Escherichia coli, possess a glutaminyl-tRNA synthetase (GlnRS), responsible for Gln-tRNA(Gln) formation. Phylogenetic analysis of the GluRS or GlnRS families (GlxRS) suggested that GlnRS has a eukaryotic origin and was horizontally transferred to a restricted set of bacteria. We have now isolated an additional GlnRS gene from the plant Lupinus luteus and analyzed in more details the modular architecture of the paralogous enzymes GluRS and GlnRS, starting from a large data set of 33 GlxRS sequences. Our analysis suggests that the ancestral GluRS-like enzyme was solely composed of the catalytic domain bearing the class-defining motifs of aminoacyl-tRNA synthetases, and that the anticodon-binding domain of GlxRSs was independently acquired in the bacteria and archaea branches of the universal tree of life, the eukarya sub-branch arising as a sister group of archaea. The transient capture of UAA and UAG codons could have favored the emergence of a GlnRS in early eukaryotes.
- Gupta RS
- Protein phylogenies and signature sequences: A reappraisal of evolutionary relationships among archaebacteria, eubacteria, and eukaryotes.
- Microbiol Mol Biol Rev. 1998; 62: 1435-91
- Display abstract
The presence of shared conserved insertion or deletions (indels) in protein sequences is a special type of signature sequence that shows considerable promise for phylogenetic inference. An alternative model of microbial evolution based on the use of indels of conserved proteins and the morphological features of prokaryotic organisms is proposed. In this model, extant archaebacteria and gram-positive bacteria, which have a simple, single-layered cell wall structure, are termed monoderm prokaryotes. They are believed to be descended from the most primitive organisms. Evidence from indels supports the view that the archaebacteria probably evolved from gram-positive bacteria, and I suggest that this evolution occurred in response to antibiotic selection pressures. Evidence is presented that diderm prokaryotes (i.e., gram-negative bacteria), which have a bilayered cell wall, are derived from monoderm prokaryotes. Signature sequences in different proteins provide a means to define a number of different taxa within prokaryotes (namely, low G+C and high G+C gram-positive, Deinococcus-Thermus, cyanobacteria, chlamydia-cytophaga related, and two different groups of Proteobacteria) and to indicate how they evolved from a common ancestor. Based on phylogenetic information from indels in different protein sequences, it is hypothesized that all eukaryotes, including amitochondriate and aplastidic organisms, received major gene contributions from both an archaebacterium and a gram-negative eubacterium. In this model, the ancestral eukaryotic cell is a chimera that resulted from a unique fusion event between the two separate groups of prokaryotes followed by integration of their genomes.
- Ribeiro S, Golding GB
- The mosaic nature of the eukaryotic nucleus.
- Mol Biol Evol. 1998; 15: 779-88
- Display abstract
The phylogenies for each of the protein-coding genes from the Methanococcus jannaschii genome were surveyed to determine the history of the major groups of life. For each gene, homologous sequences from other archaea, eucarya, and Gram-positive and Gram-negative bacteria were collected and aligned, and a phylogeny was reconstructed with a maximum-likelihood algorithm. The majority of significant phylogenies favor the eucarya and the archaca as sister groups. A smaller, but still substantial, portion of these significant phylogenies favor an eucarya/Gram-negative clade. These results indicate that support for the early history of life is not unequivocal. A chimeric origin of eukaryotes or an ancient, massive horizontal transfer of genes from Gram-negative bacteria to eucarya can explain many of the observed phylogenies.
- Hamel F, Boivin R, Tremblay C, Bellemare G
- Structural and evolutionary relationships among chitinases of flowering plants.
- J Mol Evol. 1997; 44: 614-24
- Display abstract
The analysis of nuclear-encoded chitinase sequences from various angiosperms has allowed the categorization of the chitinases into discrete classes. Nucleotide sequences of their catalytic domains were compared in this study to investigate the evolutionary relationships between chitinase classes. The functionally distinct class III chitinases appear to be more closely related to fungal enzymes involved in morphogenesis than to other plant chitinases. The ordering of other plant chitinases into additional classes mainly relied on the presence of auxiliary domains-namely, a chitin-binding domain and a carboxy-terminal extension-flanking the main catalytic domain. The results of our phylogenetic analyses showed that classes I and IV form discrete and well-supported monophyletic groups derived from a common ancestral sequence that predates the divergence of dicots and monocots. In contrast, other sequences included in classes I* and II, lacking one or both types of auxiliary domains, were nested within class I sequences, indicating that they have a polyphyletic origin. According to phylogenetic analyses and the calculation of evolutionary rates, these chitinases probably arose from different class I lineages by relatively recent deletion events. The occurrence of such evolutionary trends in cultivated plants and their potential involvement in host-pathogen interactions are discussed.
- Aberg A, Yaremchuk A, Tukalo M, Rasmussen B, Cusack S
- Crystal structure analysis of the activation of histidine by Thermus thermophilus histidyl-tRNA synthetase.
- Biochemistry. 1997; 36: 3084-94
- Display abstract
The crystal structure at 2.7 A resolution of histidyl-tRNA synthetase (HisRS) from Thermus thermophilus in complex with its amino acid substrate histidine has been determined. In the crystal asymmetric unit there are two homodimers, each subunit containing 421 amino acid residues. Each monomer of the enzyme consists of three domains: (1) an N-terminal catalytic domain containing a six-stranded antiparallel beta-sheet and the three motifs common to all class II aminoacyl-tRNA synthetases, (2) a 90-residue C-terminal alpha/beta domain which is common to most class IIa synthetases and is probably involved in recognizing the anticodon stem-loop of tRNA(His), and (3) a HisRS-specific alpha-helical domain inserted into the catalytic domain, between motifs II and III. The position of the insertion domain above the catalytic site suggests that it could clamp onto the acceptor stem of the tRNA during aminoacylation. Two HisRS-specific peptides, 259-RGLDYY and 285-GGRYDG, are intimately involved in forming the binding site for the histidine, a molecule of which is found in the active site of each monomer. The structure of HisRS in complex with histidyl adenylate, produced enzymatically in the crystal, has been determined at 3.2 A resolution. This structure shows that the HisRS-specific Arg-259 interacts directly with the alpha-phosphate of the adenylate on the opposite side to the usual conserved motif 2 arginine. Arg-259 thus substitutes for the divalent cation observed in seryl-tRNA synthetase and plays a crucial catalytic role in the mechanism of histidine activation.
- Beyer A
- Sequence analysis of the AAA protein family.
- Protein Sci. 1997; 6: 2043-58
- Display abstract
The AAA protein family, a recently recognized group of Walker-type ATPases, has been subjected to an extensive sequence analysis. Multiple sequence alignments revealed the existence of a region of sequence similarity, the so-called AAA cassette. The borders of this cassette were localized and within it, three boxes of a high degree of conservation were identified. Two of these boxes could be assigned to substantial parts of the ATP binding site (namely, to Walker motifs A and B); the third may be a portion of the catalytic center. Phylogenetic trees were calculated to obtain insights into the evolutionary history of the family. Subfamilies with varying degrees of intra-relatedness could be discriminated; these relationships are also supported by analysis of sequences outside the canonical AAA boxes: within the cassette are regions that are strongly conserved within each subfamily, whereas little or even no similarity between different subfamilies can be observed. These regions are well suited to define fingerprints for subfamilies. A secondary structure prediction utilizing all available sequence information was performed and the result was fitted to the general 3D structure of a Walker A/GTPase. The agreement was unexpectedly high and strongly supports the conclusion that the AAA family belongs to the Walker superfamily of A/GTPases.
- Pascual J, Castresana J, Saraste M
- Evolution of the spectrin repeat.
- Bioessays. 1997; 19: 811-7
- Display abstract
We now know that the evolution of multidomain proteins has frequently involved genetic duplication events. These, however, are sometimes difficult to trace because of low sequence similarity between duplicated segments. Spectrin, the major component of the membrane skeleton that provides elasticity to the cell, contains tandemly repeated sequences of 106 amino acid residues. The same repeats are also present in alpha-actinin, dystrophin and utrophin. Sequence alignments and phylogenetic trees of these domains allow us to interpret the evolutionary relationship between these proteins, concluding that spectrin evolved from alpha-actinin by an elongation process that included two duplications of a block of seven repeats. This analysis shows how a modular protein unit can be used in the evolution of large cytoskeletal structures.
- Keeling PJ, Doolittle WF
- Alpha-tubulin from early-diverging eukaryotic lineages and the evolution of the tubulin family.
- Mol Biol Evol. 1996; 13: 1297-305
- Display abstract
The tubulin gene family, which includes alpha-,beta-, and gamma-tubulin subfamilies, is composed of highly conserved proteins which are the principle structural and functional components of eukaryotic microtubules. We are interested in (1) establishing when in eukaryotic evolution the duplications leading to paralogous alpha, beta, and gamma subfamilies occurred and (2) the possible utility of tubulin sequences in reconstructing organismal phylogeny. To broaden the taxonomic representation of alpha-tubulins so that it roughly equals that of beta-tubulins, alpha-tubulin genes from three Microsporidia (Encephalitozoon hellem, Nosema locustae, and Spraguea lophii), two Parabasalia (Monocercomonas sp. and Trichomitus batrachorum), and one Heterolobosean (Acrasis rosea) were sequenced. With these new genes, phylogenetic trees of alpha- and beta-tubulins were constructed and compared. Trees were congruent with each other, but incongruent with other molecular phylogenies. The agreement between alpha- and beta-tubulin trees could arise by the co-adaptation of one molecule to variants of the other as a result of their intimate steric association in microtubules. Thus, these trees may not be providing independent support for the phylogenetic results. However, one of these unexpected results, that microsporidia cluster with fungi, is supported by other circumstantial evidence, and may therefore reflect a real relationship despite the basal position usually assigned to microsporidia. Relationships between the three tubulins were also examined by constructing trees of all three types. These trees were found to be of limited value for determining the position of the root within each subfamily because of the great interfamily distances, but they do confirm the classification of all known genes into three monophyletic subfamilies. Divergent genes from Caenorhabditis elegans and Saccharomyces cerevisiae that have been proposed to represent the novel classes delta- and epsilon-tubulin were found to be specifically related to gamma-tubulins from animals and fungi respectively, and therefore are best seen as rapidly evolving orthologues of gamma-tubulin.
- Dias HW, Aboud M, Flugel RM
- Analysis of the phylogenetic placement of different spumaretroviral genes reveals complex pattern of foamy virus evolution.
- Virus Genes. 1995; 11: 183-90
- Display abstract
Foamy or spumaviruses are complex retroviruses. Phylogenetic trees have been constructed previously for either the polymerase or integrase domains showed a clustering of the foamy viruses relatively distant from other retroviruses. The most related retrovirus was found to be murine leukemia virus, irrespective of the method used or foamy viral gene analyzed. We analyze bel genes of different foamy viruses and compared the corresponding phylogenetic trees with those obtained from the pol genes that were constructed with refined computer programs. In addition, the nucleocapsid protein sequence of foamy viruses was used for a comparative phylogenetic analysis. Known biological properties of the individual FV protein domains are discussed to ascertain the apparent phylogenetic relatedness.
- Bork P, Holm L, Koonin EV, Sander C
- The cytidylyltransferase superfamily: identification of the nucleotide-binding site and fold prediction.
- Proteins. 1995; 22: 259-66
- Display abstract
The crystal structure of glycerol-3-phosphate cytidylyltransferase from B. subtilis (TagD) is about to be solved. Here, we report a testable structure prediction based on the identification by sequence analysis of a superfamily of functionally diverse but structurally similar nucleotide-binding enzymes. We predict that TagD is a member of this family. The most conserved region in this superfamily resembles the ATP-binding HiGH motif of class I aminoacyl-tRNA synthetases. The predicted secondary structure of cytidylyltransferase and its homologues is compatible with the alpha/beta topography of the class I aminoacyl-tRNA synthetases. The hypothesis of similarity of fold is strengthened by sequence-structure alignment and 3D model building using the known structure of tyrosyl tRNA synthetase as template. The proposed 3D model of TagD is plausible both structurally, with a well packed hydrophobic core, and functionally, as the most conserved residues cluster around the putative nucleotide binding site. If correct, the model would imply a very ancient evolutionary link between class I tRNA synthetases and the novel cytidylyltransferase superfamily.
- Fani R, Lio P, Lazcano A
- Molecular evolution of the histidine biosynthetic pathway.
- J Mol Evol. 1995; 41: 760-74
- Display abstract
The available sequences of genes encoding the enzymes associated with histidine biosynthesis suggest that this is an ancient metabolic pathway that was assembled prior to the diversification of the Bacteria, Archaea, and Eucarya. Paralogous duplications, gene elongation, and fusion events involving different his genes have played a major role in shaping this biosynthetic route. Evidence that the hisA and the hisF genes and their homologous are the result of two successive duplication events that apparently took place before the separation of the three cellular lineages is extended. These two successive gene duplication events as well as the homology between the hisH genes and the sequences encoding the TrpG-type amidotransferases support the idea that during the early stages of metabolic evolution at least parts of the histidine biosynthetic pathway were mediated by enzymes of broader substrate specificities. Maximum likelihood trees calculated for the available sequences of genes encoding these enzymes have been obtained. Their topologies support the possibility of an evolutionary proximity of archaebacteria with low GC Gram-positive bacteria. This observation is consistent with those detected by other workers using the sequences of heat-shock proteins (HSP70), glutamine synthetases, glutamate dehydrogenases, and carbamoylphosphate synthetases.
- Mosyak L, Reshetnikova L, Goldgur Y, Delarue M, Safro MG
- Structure of phenylalanyl-tRNA synthetase from Thermus thermophilus.
- Nat Struct Biol. 1995; 2: 537-47
- Display abstract
The crystal structure of phenylalanyl-tRNA synthetase from Thermus thermophilus, solved at 2.9 A resolution, displays (alpha beta)2 subunit organization. Unexpectedly, both the catalytic alpha- and the non-catalytic beta-subunits comprise the characteristic fold of the class II active-site domains. The alpha beta heterodimer contains most of the building blocks so far identified in the class II synthetases. The presence of an RNA-binding domain, similar to that of the U1A spliceosomal protein, in the beta-subunit is indicative of structural relationships among different families of RNA-binding proteins. The structure suggests a plausible catalytic mechanism which explains why the primary site of tRNA aminoacylation is different from that of the other class II enzymes.
- Delarue M
- Aminoacyl-tRNA synthetases.
- Curr Opin Struct Biol. 1995; 5: 48-55
- Display abstract
Detailed mechanisms for each step of the reaction catalyzed by both class I and class II aminoacyl-tRNA synthetases have been proposed on the basis of crystallographic data of aminoacyl-tRNA synthetases in complex with their different substrates. Despite the very different topologies of the two classes, there are striking and unanticipated chemical similarities between their active sites and proposed mechanisms.
- Gupta RS
- Evolution of the chaperonin families (Hsp60, Hsp10 and Tcp-1) of proteins and the origin of eukaryotic cells.
- Mol Microbiol. 1995; 15: 1-11
- Display abstract
The members of the 10 kDa and 60 kDa heat-shock chaperonin proteins (Hsp10 and Hsp60 or Cpn10 and Cpn60), which form an operon in bacteria, are present in all eubacteria and eukaryotic cell organelles such as mitochondria and chloroplasts. In archaebacteria and eukaryotic cell cytosol, no close homologues of Hsp10 or Hsp60 have been identified. However, these species (or cell compartments) contain the Tcp-1 family of proteins (distant homologues of Hsp60). Phylogenetic analysis based on global alignments of Hsp60 and Hsp10 sequences presented here provide some evidence regarding the evolution of mitochondria from a member of the alpha-subdivision of Gram-negative bacteria and chloroplasts from cyanobacterial species, respectively. This interference is strengthened by the presence of sequence signatures that are uniquely shared between Hsp60 homologues from alpha-purple bacteria and mitochondria on one hand, and the chloroplasts and cyanobacterial hsp60s on the other. Within the alpha-purple subdivision, species such as Rickettsia and Ehrlichia, which live intracellularly within eukaryotic cells, are indicated to be the closest relatives of mitochondrial homologues. In the Hsp60 evolutionary tree, rooted using the Tcp-1 homologue, the order of branching of the major groups was as follows: Gram-positive bacteria--cyanobacteria and chloroplasts--chlamydiae and spirochaetes--beta- and gamma-Gram-negative purple bacteria--alpha-purple bacteria--mitochondria. A similar branching order was observed independently in the Hsp10 tree. Multiple Hsp60 homologues, when present in a group of species, were found to be clustered together in the trees, indicating that they evolved by independent gene-duplication events. This review also considers in detail the evolutionary relationship between Hsp60 and Tcp-1 families of proteins based on two different models (viz. archaebacterial and chimeric) for the origin of eukaryotic cell nucleus. Some predictions of the chimeric model are also discussed.
- Marsh TL, Reich CI, Whitelock RB, Olsen GJ
- Transcription factor IID in the Archaea: sequences in the Thermococcus celer genome would encode a product closely related to the TATA-binding protein of eukaryotes.
- Proc Natl Acad Sci U S A. 1994; 91: 4180-4
- Display abstract
The first step in transcription initiation in eukaryotes is mediated by the TATA-binding protein, a subunit of the transcription factor IID complex. We have cloned and sequenced the gene for a presumptive homolog of this eukaryotic protein from Thermococcus celer, a member of the Archaea (formerly archaebacteria). The protein encoded by the archaeal gene is a tandem repeat of a conserved domain, corresponding to the repeated domain in its eukaryotic counterparts. Molecular phylogenetic analyses of the two halves of the repeat are consistent with the duplication occurring before the divergence of the archael and eukaryotic domains. In conjunction with previous observations of similarity in RNA polymerase subunit composition and sequences and the finding of a transcription factor IIB-like sequence in Pyrococcus woesei (a relative of T. celer) it appears that major features of the eukaryotic transcription apparatus were well-established before the origin of eukaryotic cellular organization. The divergence between the two halves of the archael protein is less than that between the halves of the individual eukaryotic sequences, indicating that the average rate of sequence change in the archael protein has been less than in its eukaryotic counterparts. To the extent that this lower rate applies to the genome as a whole, a clearer picture of the early genes (and gene families) that gave rise to present-day genomes is more apt to emerge from the study of sequences from the Archaea than from the corresponding sequences from eukaryotes.
- Sanz JL, Huber G, Huber H, Amils R
- Using protein synthesis inhibitors to establish the phylogenetic relationships of the Sulfolobales order.
- J Mol Evol. 1994; 39: 528-32
- Display abstract
The sensitivity of the cell-free protein synthesis systems from Acidanus brierleyi, Acidianus infernus, and Metallosphaera sedula, members of the archaeal order Sulfolobales, to 40 antibiotics with different specificities has been studied. The sensitivity patterns were compared to those of Sulfolobus solfataricus and other archaeal, bacterial, and eukaryotic systems. The comparative analysis shows that ribosomes from the sulfolobales are the most refractory to inhibitors of protein synthesis described so far. The sensitivity results have been used to ascertain in phylogenetic relationships among the members of the order Sulfolobales. The evolutionary significance of these results are analyzed in the context of the phylogenetic position of this group of extreme thermophilic microorganisms.
- Kisselev LL, Wolfson AD
- Aminoacyl-tRNA synthetases from higher eukaryotes.
- Prog Nucleic Acid Res Mol Biol. 1994; 48: 83-142
- Fagan MJ, Saier MH Jr
- P-type ATPases of eukaryotes and bacteria: sequence analyses and construction of phylogenetic trees.
- J Mol Evol. 1994; 38: 57-99
- Display abstract
The amino acid sequences of 47 P-type ATPases from several eukaryotic and bacterial kingdoms were divided into three structural segments based on individual hydropathy profiles. Each homologous segment was (1) multiply aligned and functionally evaluated, (2) statistically analyzed to determine the degrees of sequence similarity, and (3) used for the construction of parsimonious phylogenetic trees. The results show that all of the P-type ATPases analyzed comprise a single family with four major clusters correlating with their cation specificities and biological sources as follows: cluster 1: Ca(2+)-transporting ATPases; cluster 2: Na(+)- and gastric H(+)-ATPases; cluster 3: plasma membrane H(+)-translocating ATPases of plants, fungi, and lower eukaryotes; and cluster 4: all but one of the bacterial P-type ATPases (specific for K+, Cd2+, Cu2+ and an unknown cation). The one bacterial exception to this general pattern was the Mg(2+)-ATPase of Salmonella typhimurium, which clustered with the eukaryotic sequences. Although exceptions were noted, the similarities of the phylogenetic trees derived from the three segments analyzed led to the probability that the N-terminal segments 1 and the centrally localized segments 2 evolved from a single primordial ATPase which existed prior to the divergence of eukaryotes from prokaryotes. By contrast, the C-terminal segments 3 appear to be eukaryotic specific, are not found in similar form in any of the prokaryotic enzymes, and are not all demonstrably homologous among the eukaryotic enzymes. These C-terminal domains may therefore have either arisen after the divergence of eukaryotes from prokaryotes or exhibited more rapid sequence divergence than either segment 1 or 2, thus masking their common origin. The relative rates of evolutionary divergence for the three segments were determined to be segment 2 < segment 1 < segment 3. Correlative functional analyses of the most conserved regions of these ATPases, based on published site-specific mutagenesis data, provided preliminary evidence for their functional roles in the transport mechanism. Our studies define the structural and evolutionary relationships among the P-type ATPases. They should provide a guide for the design of future studies of structure-function relationships employing molecular genetic, biochemical, and biophysical techniques.
- Goodson HV
- Molecular evolution of the myosin superfamily: application of phylogenetic techniques to cell biological questions.
- Soc Gen Physiol Ser. 1994; 49: 141-57
- Display abstract
We have used distance matrix and maximum parsimony methods to study the evolutionary relationships between members of the myosin superfamily of molecular motors. Amino acid sequences of the conserved core of the motor region were used in the analysis. Our results show that myosins can be divided into at least three main classes, with two types of unconventional myosin being no more related to each other than they are to conventional myosin. Myosins have traditionally been classified as conventional or unconventional, with many of the unconventional myosin proteins thought to be distributed in a narrow range of organisms. We find that members of all three of these main classes are likely to be present in most (or all) eukaryotes. Three proteins do not cluster within the three main groups and may each represent additional classes. The structure of the trees suggests that these ungrouped proteins and some of the subclasses of the main classes are also likely to be widely distributed, implying that most eukaryotic cells contain many different myosin proteins. The groupings derived from phylogenetic analysis of myosin head sequences agree strongly with those based on tail structure, developmental expression, and (where available) enzymology, suggesting that specific head sequences have been tightly coupled to specific tail sequences throughout evolution. Analysis of the relationships within each class has interesting implications. For example, smooth muscle myosin and striated muscle myosin seem to have independently evolved from nonmuscle myosin. Furthermore, brush border myosin I, a type of protein initially thought to be specific to specialized metazoan tissues, probably has relatives that are much more broadly distributed.
- Di Giulio M
- The evolution of aminoacyl-tRNA synthetases, the biosynthetic pathways of amino acids and the genetic code.
- Orig Life Evol Biosph. 1992; 22: 309-19
- Display abstract
In this paper the partition metric is used to compare binary trees deriving from (i) the study of the evolutionary relationships between aminoacyl-tRNA synthetases, (ii) the physicochemical properties of amino acids and (iii) the biosynthetic relationships between amino acids. If the tree defining the evolutionary relationships between aminoacyl-tRNA synthetases is assumed to be a manifestation of the mechanism that originated the organization of the genetic code, then the results appear to indicate the following: the hypothesis that regards the genetic code as a map of the biosynthetic relationships between amino acids seems to explain the organization of the genetic code, at least as plausibly as the hypotheses that consider the physicochemical properties of amino acids as the main adaptive theme that lead to the structuring of the code.
- Schimmel P, Shepard A, Shiba K
- Intron locations and functional deletions in relation to the design and evolution of a subgroup of class I tRNA synthetases.
- Protein Sci. 1992; 1: 1387-91
- Felsenstein J
- Counting phylogenetic invariants in some simple cases.
- J Theor Biol. 1991; 152: 357-76
- Display abstract
An informal degrees of freedom argument is used to count the number of phylogenetic invariants in cases where we have three or four species and can assume a Jukes-Cantor model of base substitution with or without a molecular clock. A number of simple cases are treated and in each the number of invariants can be found. Two new classes of invariants are found: non-phylogenetic cubic invariants testing independence of evolutionary events in different lineages, and linear phylogenetic invariants which occur when there is a molecular clock. Most of the linear invariants found by Cavender (1989, Molec. Biol. Evol. 6, 301-316) turn out in the Jukes-Cantor case to be simple tests of symmetry of the substitution model, and not phylogenetic invariants.
- Schimmel P
- Classes of aminoacyl-tRNA synthetases and the establishment of the genetic code.
- Trends Biochem Sci. 1991; 16: 1-3
- Jensen RA, Ahmad S
- Nested gene fusions as markers of phylogenetic branchpoints in prokaryotes.
- Trends Ecol Evol. 1990; 5: 219-24
- Display abstract
Phylogenetic trees for prokaryotic microorganisms are being assembled at a rapid pace, primarily through sequence comparisons of ribosomal RNA genes. For lineages that diverged from the ancestral stem at nearly the same time, the order of branching may be uncertain. The problem applies both to minor branches that separated very recently and to major branches that diverged long ago. Bifunctional proteins produced by gene fusion provide the clarity of a plus-or-minus character state, and analysis of the distribution of genefusion patterns can reveal the order of phylogenetic branching.
- Sankoff D
- Designer invariants for large phylogenies.
- Mol Biol Evol. 1990; 7: 255-69
- Display abstract
The Cavender-Felsenstein edge-length invariants for binary characters on 4-trees provide the starting point for the development of "customized" invariants for evaluating and comparing phylogenetic hypotheses. The binary character invariants may be generalized to k-valued characters without losing the quadratic nature of the invariants as functions of the theoretical frequencies f(UVXY) of observable character configurations (U at organism 1, V at 2, etc.). The key to the approach is that certain sets of these configurations constitute events which are probabilistically independent from other such sets, under the symmetric Markov change models studied. By introducing more complex sets of configurations, we find the quadratic invariants for 5-trees in the binary model and for individual edges in 6-trees or, indeed, in any size tree. The same technique allows us to formulate invariants for entire trees, but these are cubic functions for 6-trees and are higher-degree polynomials for larger trees. With k-valued characters and, especially, with large trees, the types of configuration sets (events) used in the simpler examples are too rare (i.e., their predicted frequencies are too low) to be useful, and the construction of meaningful pairs of independent events becomes an important and nontrivial task in designing invariants suited to testing specific hypotheses. In a very natural way, this approach fits in with well-known statistical methodology for contingency tables. We explore use of events such as "only transitions occur for character i (i.e., position i in a nucleic acid sequence) in subtree a" in analyzing a set of data on ribosomal RNA in the context of the controversy over the origins of archaebacteria, eubacteria, and eukaryotes.
- Rauhut R, Gabius HJ, Cramer F
- Phenylalanyl-tRNA synthetases as an example for comparative and evolutionary aspects of aminoacyl-tRNA synthetases.
- Biosystems. 1986; 19: 173-83
- Display abstract
Aminoacyl-tRNA synthetases are indispensable components of protein synthesis in all three lines of evolutionary descent, eubacteria, archaebacteria and eukaryotes. Furthermore they are also present in the translational apparatus of the semi-autonomous organelles, mitochondria and chloroplasts, of the eukaryotic cell. Therefore aminoacyl-tRNA synthetases are appropriate objects for comparative molecular biology in order to obtain a comprehensive picture of the evolution of the translational process. The analysis of the phenylalanyl-tRNA synthetase in a large variety of organisms and organelles in this respect is the most advanced. In addition to comparison of quaternary structure, analysis includes functional aspects of accuracy mechanisms (proofreading) and comparison of structural features by means of substrate analogs. Evolutionary relationships are furthermore elucidated using the immunological approach and heterologous aminoacylation.
- Dene H, Goodman M, Walz DA, Romero-Herrera AE
- The phylogenetic position of aardvark (Orycteropus afer) as suggested by its myoglobin.
- Hoppe Seylers Z Physiol Chem. 1983; 364: 1585-95
- Display abstract
Skeletal muscle myoglobin of the aardvark (Orycteropus afer) was isolated and its primary structure determined. The amino-acid sequence was then used in conjunction with previously established myoglobin sequences to evaluate the phylogenetic relationships of the aardvark. The most parsimonious trees constructed from this myoglobin sequence data either alone or when combined with lens alpha-crystallin A sequence data depict the aardvark lineage as one of the most ancient among Eutheria.
- Kwok Y, Wong JT
- Evolutionary relationship between Halobacterium cutirubrum and eukaryotes determined by use of aminoacyl-tRNA synthetases as phylogenetic probes.
- Can J Biochem. 1980; 58: 213-8
- Display abstract
The cross-species reactivities between tRNAs and aminoacyl-tRNA synthetases have been employed as a basis to estimate the relatedness of various prokaryotes to the eukaryotes. The tRNA of Halobacterium cutirubrum, unlike that of other prokaryotes tested, including Agrobacterium tumefaciens, Arthrobacter luteus, Bacillus subtilis, Bacillus stearothermophilus, Escherichia coli, Micrococcus luteus, Myxococcus xanthus, Rhodopseudomonas spheroides, and Thermus aquaticus, was found to share with yeast, rat liver, and wheat germ tRNA a distinct preference for aminoacylation by eukaryotic synthetases from yeast as opposed to prokaryotic synthetases from either E. coli or R. spheroides. These results suggest that phylogenetically H. cutirubrum is more closely related to the eukaryotes than to the eubacteria.
- Thompson LH, Lofgren DJ, Adair GM
- Evidence for structural gene alterations affecting aminoacyl-tRNA synthetases in CHO cell mutants and revertants.
- Somatic Cell Genet. 1978; 4: 423-35
- Display abstract
Aminoacyl-tRNA synthetase (aaRS) activities in extracts of mutant strains of the Chinese hamster ovary line (CHO) were examined for alterations in thermal stability. Mutants having low activity for MetRS, AsnRS, or GlnRS contained aaRSs that were inactivated much more rapidly upon heating than those from wild-type cells. Revertant lines, isolated from cultures of these mutants (Asn-5, Met-2, and Gln-2) after treatment with nitrosoguanidine or ethyl methanesulfonate, had thermolabilities intermediate between mutant and wild-type, and consistently had higher activities than the mutants. With a modified in vivo aminoacylation procedure, two previously exceptional mutants. Arg-1 and His-1, showed pronounced reductions in the amount of arginyl-tRNA or histidyl-tRNA, respectively, under restrictive conditions, compared to wild type. Revertants of Arg-1 (like the mutant itself) had no measurable ArgRS in vitro activity (less than 0.4% of wild type) although in vivo aminoacylation in the one revertant tested was partially restored. These data provide evidence that the forward mutations have occurred in the structural genes of the aaRSs and that most of the reversions are probably the result of second-site point mutations in the aaRS genes.