De novo gene birth
De novo gene birth is the process by which new
Although de novo gene birth may have occurred at any point in an organism's evolutionary history, ancient de novo gene birth events are difficult to detect. Most studies of de novo genes to date have thus focused on young genes, typically taxonomically restricted genes (TRGs) that are present in a single species or lineage, including so-called
Although de novo gene birth was once viewed as a highly unlikely occurrence, several unequivocal examples have now been described, and some researchers speculate that de novo gene birth could play a major role in evolutionary innovation.
As early as the 1930s,
In the same year, however, Pierre-Paul Grassé coined the term "overprinting" to describe the emergence of genes through the expression of alternative
The phenomenon of exonization also represents a special case of de novo gene birth, in which, for example, often-repetitive intronic sequences acquire splice sites through mutation, leading to de novo exons. This was first described in 1994 in the context of Alu sequences found in the coding regions of primate mRNAs. Interestingly, such de novo exons are frequently found in minor splice variants, which may allow the evolutionary “testing” of novel sequences while retaining the functionality of the major splice variant(s).
Still, it was thought by some that most or all eukaryotic proteins were constructed from a constrained pool of “starter type” exons. Using the sequence data available at the time, a 1991 review estimated the number of unique, ancestral eukaryotic exons to be < 60,000, while in 1992 a piece was published estimating that the vast majority of proteins belonged to no more than 1,000 families. Around the same time, however, the sequence of chromosome III of the budding yeast
In 2006 and 2007, a series of studies provided arguably the first documented examples of de novo gene birth that did not involve overprinting. An analysis of the accessory gland transcriptomes of
Despite their recent evolution, all five genes appear fixed in D. melanogaster, and the presence of paralogous non-coding sequences that are absent in close relatives suggests that four of the five genes may have arisen through a recent intrachromosomal duplication event. Interestingly, all five were preferentially expressed in the testes of male flies (see below). The three genes for which complete ORFs exist in both D. melanogaster and D. simulans showed evidence of rapid evolution and positive selection. This is consistent with a recent emergence of these genes, as it is typical for young, novel genes to undergo adaptive evolution, but it also makes it difficult to be completely sure that the candidates encode truly functional products. A subsequent study using methods similar to Levine et al. and an
Three of these genes are extremely short (<90 bp), suggesting that they may be RNA genes, although several examples of very short functional peptides have also been documented. Around the same time as these studies in Drosophila were published, a homology search of genomes from all domains of life, including 18 fungal genomes, identified 132 fungal-specific proteins, 99 of which were unique to S. cerevisiae.
Since these initial studies, many groups have identified specific cases of de novo gene birth events in diverse organisms. The BSC4 gene in S. cerevisiae, identified in 2008, shows evidence of purifying selection, is expressed at both the mRNA and protein levels, and when deleted is synthetically lethal with two other yeast genes, all of which indicate a functional role for the BSC4 gene product. Historically, one argument against the notion of widespread de novo gene birth is the evolved complexity of protein folding. Interestingly, Bsc4 was later shown to adopt a partially folded state that combines properties of native and non-native protein folding. Another well-characterized example in yeast is MDF1, which both represses mating efficiency and promotes vegetative growth, and is intricately regulated by a conserved antisense ORF. In plants, the first de novo gene to be functionally characterized was QQS, an