Phylogenesis of Histone H3 How to study the Evolution of Histone H3 ?


    The very slow evolution of core histones presents unique challenges.

  • Phylogeny is the study of evolutionary relationships among species.
      It uses as data the observable differences in nucleotide and protein sequences among species and deduces from this information the changes that must have occurred in DNA relative to the common ancestral state and species.
      Thus, phylogenetic analysis deduces the evolutionary relationship among slowly evolving species based on the more rapid changes in evolving genes and proteins.
  • Histone H3 proteins evolve more slowly than species.
      Apparently, core histone genes like histone H3 exist under strong purifying selection pressure, reflecting the multiple complex nucleosomal processes that involve core histones.

    Standard phylogenetic methods applied to the histone H3 protein data set
    • yield some insight of histone H3 evolution across major evolutionary branches and kingdoms, but
    • fail to reflect the evolution of H3 protein among more closely related species.
    Standard phylogenetic methods may mis-identify [1-2] [see H3 evolution page] the origin of histone H3 gene duplication and functional differentiation because the strong purifying selective pressure leads to co-evolutionary changes of functionally distinct histone H3 genes within organisms.

  • A more faithful representation of the evolution of histone H3 proteins, including the ancient duplication and functional diversification into RC and RI H3 variants, is obtained by superimposing histone H3 protein sequences on the speciation branching pattern of all eukaryotes.
    Our current consensus phylogenetic understanding of speciation is reflected in the Tree of Life which was used, with success, to trace the evolution of histone H3.
    1. Data sources for "data mining".
      The author of these ideas gratefully acknowledges that this research project is wholly dependent on the tireless efforts of organizations, humans and robots to sequence more genomes from more and more organisms.
    • A 1996 analysis of the existence of two H3 histones in animals, plants and Tetrahymena was based on just 272 histone H3 sequences from 81 species (55 genera) and reached an incorrect conclusion. [1]
    • Analysis is based on more than 4,500 histone H3 sequences in some 600 species across more than 400 genera.

    H3 sequences were collected by text and tblastn searches of the following public databases:
    2. Histones database
      A MicroSoft SQL Server 2000 database named "Histones" was developed to collect original histone database records, manipulated and translated sequences and various index values including taxonomic information.
      Annotations are added in sequence records to document variant interpretation of sequence information such as splice sites, and corrections for frame shift and other data errors observed, typically based on multi-sequence alignments in MEGA of sequences from the same organism.
    3. Sequence Translation Analysis
      Vector NTI AdvanceTM 10 (2005), module Vector NTI®, by Invitrogen was employed for nucleotide sequence translation and intron-exon boundary analysis. This program is available for non-commercial use from www.invitrogen.com/bioinformatics
    4. Sequence Alignment and Phylogenetic Analysis
      MEGA3 Molecular Evolutionary Genetics Analysis, Version 3.1 was obtained from www.megasoftware.net and used for data manipulation and analysis until August 2007 when it was upgraded to MEGA4. [5-6]
    • Raw nucleotide sequences were prepared for analysis in the MEGA Text File Editor.
    • Nucleotide and translated protein sequences were aligned manually using the MEGA Alignment Editor and saved as MEGA database objects (.mas files).
    • Phylogenetic bootstrap analyses on aligned consensus H3 protein sequences were performed on MEGA .meg data files using Neighbor Joining, Maximum Parsimony and Minimal Evolution methodologies.
    5. Phylogenetic and Taxonomic data sources
    • Tree of Life: tolweb.org/tree [7] was used as the major source of phylogenetic information. It represents a consensus phylogeny of speciation on Earth, as far as that has been determined by gene- and genome-based phylogenetic analysis.
      Evolutionary relationships between many species and divergent branches on the Tree of Life have been determined. This is marked by defined bifurcate tree branches, as shown below for the evolution of animals and fungi.
      For many organisms, phylogenetic data have not yet been collected or in insufficient amounts to establish with any degree of confidence evolutionary relationships. In the Tree of Life, this is marked by non-bifurcate branching, as illustrated below for the undefined relationships among stramenopiles, alveolates, rhodophyta and green plants.

 
      Where phylogenetic information is still missing, taxonomic classifications were used in the hope that this would best approximate evolutionary relationships, awaiting phylogenetic evidence.
    • The NCBI Entrez Taxonomy database was used as the primary basis for species nomenclature.
    • A recent paper by Adl and coworkers 8 was used as a secondary source for taxonomic information on protists, absent from the Tree of Life compilation.
    6. Presentation of Histone H3 sequence data
    • The full consensus sequence of histone H3 proteins in selected organisms, and the deduced ancestral forms of histone H3 are presented as complete protein sequences.
    • A tree branching structure which reflects Tree-of-Life phylogenetic speciation is shown for those species where sufficient histone H3 information is available to allow an unambiguous or highly likely H3 protein sequence to be known.
      • Histone H3 protein residues are numbered from 1 to 135 in the typical H3 protein present in higher eukaryotic cells, assigning the number '0' to the translation-initiating methionine which is removed from the mature H3 protein.
      • Selected and consensus H3 protein sequences are marked by the unique name used in the Histones database, typically a unique GenBank identifier.
      • Identical histone sequences are shown in the same color.
      • Deduced changes in the H3 protein are numbered, using standard single letter amino acid codes, to identify the residues deduced before and after the identified change. Generally, in addition a color change assists in identification of a protein sequence variation.
      • Deduced gene duplication events, including associated amino acid changes, are marked by a black bar.
      • Multiple distinct histone H3 proteins that co-exist in the same species are shown as parallel lines that differ in color. Co-expression of all H3 variant forms is typically assumed but often has not been confirmed experimentally.
    • The H3 protein evolutionary tree is presented as a simple but large pdf files for each of the major phyla, animals, plants/green algae, and fungi. These trees are constructed using phylogenetic relationships, where available.
    • Some tree branches and most deep roots are more likely based on taxonomy because even semi-solid evidence for evolutionary relationships is still missing. This limit on the histone H3 evolutionary analysis, especially for lower eukaryotic clades, is currently being ameliorated by broad evolutionary analyses of multiple genes/proteins across broad representations of fundamental clades [see references 9-13].
    • The tree-like branching patterns of demonstrated (or assumed) speciation with superimposed changes in H3 proteins can be browsed within the Acrobat browser Reader. Note that files may be printed on virtual paper of as much as 12 inches by 44 inches for a single tree in order to retain full legibility.
    • All aligned histone H3 protein sequences are provided as images of the tables, sorted alphabetically or as presented in the phylogenetic tree. These images are fully browser compatible: just scroll across the information.

References

  1. Waterborg JH, Robertson AJ. "Common features of analogous replacement histone H3 genes in animals and plants." J. Mol. Evol. 43, 194-206, 1996.
  2. Thatcher TH, MacGaffey J, Bowen J, Horowitz S, Shapiro DL, Gorovsky MA. "Independent evolutionary origin of histone H3.3-like variants of animals and Tetrahymena." Nucleic Acids Res. 22, 180-186, 1994.
  3. Mariño-Ramírez L, Hsu B, Baxevanis A, Landsman D. "The Histone Database: a comprehensive resource for histones and histone fold-containing proteins." Proteins 62(4), 838-842, 2006.
  4. Liolios K, Tavernarakis N, Hugenholtz P, Kyrpides NC. "The Genomes On Line Database (GOLD) v.2: a monitor of genome projects worldwide." Nucleic Acids Res. 34, D332-334, 2006.
  5. Kumar S, Tamura K, Nei M. "MEGA3: Integrated software for Molecular Evolutionary Genetics Analysis and sequence alignment." Briefings in Bioinformatics 5, 150-163, 2004.
  6. Tamura K, Dudley J, Nei M, Kumar S. "MEGA4: Molecular Evolutionary Genetics Analysis (MEGA) software version 4.0." Mol. Biol. Evol. 24, 1596-1599, 2007.
  7. Maddison DR, Schulz KS (eds). "The Tree of Life Web Project." 1996-2006. Internet address: http://tolweb.org
  8. Adl SM, Simpson AGB, Farmer MA, Andersen RA, Anderson OR, Barta JR, Bowser SS, Brugerolle G, Fensome RA, Fredericq S, James TY, Karpov S, Kugrens P, Krug J, Lane CE, Lewis LA, Lodge J, Lynn DH, Mann DG, McCourt RM, Mendoza L, Moestrup O, Mozley-Standridge SE, Nerad TA, Shearer CA, Smirnov AV, Spiegel FW, Taylor MFJR. "The new higher level classification of eukaryotes with emphasis on the taxonomy of protists." J. Eukaryot. Microbiol. 52, 399-451, 2005.
  9. Rodriguez-Ezpeleta N, Brinkmann H, Burey SC, Roure B, Burger G, Loffelhardt W, Bohnert HJ, Philippe H, Lang BF. "Monophyly of primary photosynthetic eukaryotes: green plants, red algae, and glaucophytes." Curr. Biol. 15, 1325-1330, 2005.
  10. Burki F, Pawlowski J. "Monophyly of Rhizaria and multigene phylogeny of unicellular bikonts." Mol. Biol. Evol. 23, 1922-1930, 2006.
  11. Simpson AG, Inagaki Y, Roger AJ. "Comprehensive multigene phylogenies of excavate protists reveal the evolutionary positions of "primitive" eukaryotes." Mol. Biol. Evol. 23, 615-625, 2006.
  12. Rodriguez-Ezpeleta N, Brinkmann H, Burger G, Roger AJ, Gray MW, Philippe H, Lang BF. "Toward resolving the eukaryotic tree: the phylogenetic positions of jakobids and cercozoans." Curr. Biol. 17, 1420-1425, 2007.
  13. Ruiz-Trillo I, Burger G, Holland PW, King N, Lang BF, Roger AJ, Gray MW. "The origins of multicellularity: a multi-taxon genome initiative." Trends Genet. 23, 113-118, 2007.

© 2007 Jakob Waterborg.  E-mail <WaterborgJ@umkc.edu>