Home | Credits | Mimivirus data/analysis Center | News | Top 100 largest viral genomes | Bibliography | Links |

Why focusing on large viruses?
Mimivirus genome surprises
"Bag of genes" vs reductive evolution
Understanding the origin of large DNA viruses

Why focusing on large viruses?

Following the spectacular achievement of Barrell's team sequencing the 230 kb of human cytomegalovirus (Human herpesvirus 5) as early as 1990 [1], numerous larger viral genomes [Table 1] have been added to the databases without generating much emotion or provoking any significant change in the perception/notion of virus that prevails in the general community of biologists. In our collective subconscious mind, viruses are still thought of as highly optimized minimal "bag of genes", packaging just enough information to deal with host infection and highjack the host machinery into the making of more particles. Given the simplicity of a minimal particle (a few capsid proteins and a core protein for genome packaging), a viral genome is thus expected to carry less than a dozen of genes. This is the archetype of retroviruses (RNA viruses using RNA to DNA reverse transcription). Together with the concept of viral genomes being a minimal set of genes, came the notion that - with the exception of the capsid gene (the origin of which is quite mysterious) - most of viral genes where simply stolen from their hosts, and thus did not contain much phylogenetic clues about the origin of viruses as a whole.
In summary, the prevalent view is that viruses were lucky combinations of self-replicating molecules - not living organisms - and per se not noble enough to deserve a genealogy.
The identification of many large DNA viruses exhibiting more than two hundred, then three hundred genes (Table 1) over the last 14 years, was apparently not sufficient to change this view, outside of a small circle of specialists.
With the discovery of mimivirus, the first virus that overlap with parasitic cellular organism in term of size (400 nm) [2] and genome complexity (over 1000 genes) [3], we now believe with have a unique opportunity to challenge the conservative attitude about viruses, and thoroughly revisit the concept and biological significance of virus (at least large DNA viruses). A useful first move would be to depart from the world of regular viruses by using a new name (for instance: Girus, Archevirus, ... etc) to refer to mimivirus and future other cell-sized viruses.

Mimivirus genome surprises

The surprise in the mimivirus genome, was not only its record 1.2-Mb size, but the nature of its gene content itself. The detailed analysis of mimivirus genome analysis brings about three different lines of evidence [3].
1)mimivirus is a "regular" nucleocytoplasmic large DNA virus (NCLDV): it contains all the core genes that have been identified as strictly or most often conserved in previously described pox-, irido-, asfar- and phycodna- viruses.
2)Yet, an analysis of the most similar homologs of mimivirus genes, as well as its pattern of loss of facultative NCLDV "core genes" does not suggest an affinity with one of the established NCLDV families. Mimivirus appears to be the first representative of the mimiviridae, a proposed new NCLDV group.
3)In addition to a normal complement of NCLDV gene homologs, mimivirus exhibits many genes encoding functions never encountered in any virus, including eight components central to protein translation: four aminoacyl tRNA synthetases (aaRS) together with four translation factors relevant to each of the initiation, elongation and termination steps. The enzymatic activity of mimivirus tyrRS has been experimentally verified.

This first encounter of translation apparatus protein encoding genes in a virus violates a well established dogma. Lacking ribosomes, viruses are supposed to entirely hand over the synthesis of their own proteins to their host's machinery. Already weaken by the discovery of numerous tRNA in some phycodnavirus [4], this dogma is now further demolished by the finding of virus-encoded translation enzymes and factors. Remember that aaRS are responsible for the correct application of the genetic code by insuring that the proper amino-acid is loaded on the right tRNA. The discovery of the first viral homologs of translation components also had important practical consequences. Like the DNA and RNA polymerases, aaRS have homologs within all domains of life: Archaea, Eubacteria and Eukarya. They could thus be used to improve the phylogenetic analysis of NCLDVs and precise their branching on the "Tree of Life", the root of which is the mythical Last Universal Common Ancestor (LUCA). The big surprise was then to see mimivirus defining its own branch, independent enough to suggest the existence of a 4th domain of life [Figure 1]. The precise scenario of what happened 3 billions years ago is a probably lost for ever, but the phylogenetic position of mimivirus (and by extension of other NCLDVs) is at least consistent with the hypothesis that the organism that provided mimivirus core gene set was already in existence at the time of (or prior) the emergence of the first eukaryotic cells. Such an ancestral anchoring of mimivirus to the Tree of Life is also consistent with previously proposed ideas linking DNA viruses to the emergence of the nucleus [5]. Thus, large DNA viruses could be promoted from the lowest status of non-living parasites to the one of bona fide ancestors of our own cells!

"Bag of genes" vs reductive evolution

If we believe this new genealogy, the gene content of large viruses is now taken an entirely new meaning. As simple "bag of genes", viruses were hopeless in term of evolutionary reconstruction. The presence or absence of genes in a given viral genome (with very little overlap from a virus to the next) was seen as the result of erratic captures from random host encounters, and not loaded with much general biological significance. As viral genome sizes keep increasing - as to finally reach the level of complexity of the smallest cellular organisms (Table 1)-, it becomes less and less tenable that so many genes would have been randomly acquired (and kept) for no good reason. This is at least quite inconsistent with the idea that virus are somewhat "optimized" parasites (what's the point of acquiring a partial protein translation apparatus?). As the credibility of the "bag of genes" concept decreases, the symmetrical notion that these large DNA viruses might represent various stages of reductive evolution from a more complex ancestor is gaining more support. In this context, mimivirus might represent some kind of a living fossil, the least degraded extant relative of an ancestral organism at the origin of many diverse virus families (the NCLDVs and others). The parallel is easy to make with the world of strictly parasitic intracellular bacteria (many are listed in Table 1). While strictly adapting to their various hosts, these bacteria (Chlamydia, Treponema, Mycoplasma, Rickettsia, ...etc) have lost large chunks of their ancestral gene complements and retained largely non-overlapping parts of metabolism. As a result they do share less than 100 core genes encoding essential components of DNA and RNA processing as well as protein translation [Clusters of Orthologous Groups (COG) Database]. DNA viruses might be the result of a different sort of parasitic reductive evolution, the one along which protein translation was NOT conserved. In this context, considering Buchnera or Rickettsia as bona fide "living organisms" and not large DNA viruses appears quite arbitrary.

Understanding the origin of large DNA viruses

If we now accept that the different gene contents of large DNA viruses are the results of various reductive evolutionary pathways from a common ancestor (eventually mixed with occurrences of laterally transferred genes, as it is the case for parasitic bacteria), studying the phylogeny of viral genes and applying global comparative approaches to the increasing number of large DNA virus genomes can be used to draw a tentative picture of the ancestral life form at the origin of today's DNA viruses. The purpose of WWW.GiantVirus.org is to help in this endeavor.

Figure 1. Mimivirus branching in the Tree of Life. None of the genes used to generate this tree exhibited evidence of recent lateral transfer.

OrganismCodeGenome size (bp)Date
MimivirusY6537331.181.404Nov 2004
Treponema pallidumNC_0009191.138011Sep 2001
Rickettsia prowazekiiNC_0009631.111523Sep 2001
Chlamydia muridarumNC_0026201.072950Oct 2001
Chlamydia trachomatisNC_0001171.042.519Sep 2001
Mycoplasma pulmonisNC_002771963.879Oct 22001
Tropheryma whippleiNC_004572927.303Feb 2003
Onion yellows phytoplasmaNC_005303860.631Dec 2003
Mycoplasma pneumoniaeNC_000912816.394Apr 2001
Mycoplasma mobileNC_006908777.079May 2004
Ureaplasma parvumNC_002162751.719Jan 2000
Wigglesworthia glossinidiaNC_004344697.724Jul 2003
Buchnera aphidicolaNC_004545615.980Jan 2003
Mycoplasma genitaliumNC_000908580.074Jan 2001
Nanoarchaeum equitansNC_005213490.885Feb 2004
Canarypox virusNC_005309359.853Jan 2004
Ectocarpus siliculosus virusNC_002687335.593Feb 2001
Paramecium bursaria Chlorella virus 1NC_000852330.743Feb 1996
Shrimp white spot syndrome virusNC_003225305.107Nov 2001
Human herpesvirus 5NC_001347230.287Mar 1990

Table I. The largest viral genomes compared to the smallest procaryote genomes. Cellular organisms with a genome complexity lower than that of mimivirus are listed in red (redundant species are not listed). Polydnavirus is not listed, since its large (560 kb) atypical genome only encode 156 genes [6]. For all other organisms, there is approximately one gene per kb. A partial sequence (498 kb) of bacteriophage G genome (estimated length: 670 kb) is available from the Pittsburgh Bacteriophage Institute (http://pbi.bio.pitt.edu/).


1.Chee MS, Bankier AT, Beck S, et al. Analysis of the protein-coding content of the sequence of human cytomegalovirus strain AD169. Curr. Top. Microbiol. Immunol. 1990 ; 154 : 125-169.
2.Raoult D, Audic S, Robert C, et al. The 1.2-Megabase Genome Sequence of Mimivirus. Science Express 2004 (14 Oct 2004).
3.La Scola B, Audic S, Robert C, et al. A giant virus in Amoebae. Science 2003; 299: 233.
4.Van Etten JL, Graves MV, Müller DG, et al. Phycodnaviridae - large DNA algal viruses. Arch. Virol. 2002 ; 147: 1479-1516.
5.Pennisi E. The birth of the nucleus. Science 2004 ; 305: 766.
6.Espagne E, Dupuy C, Huguet E, et al. Genome sequence of a polydnavirus : insights into symbiotic virus evolution. Science 2004; 306: 286-9.