Modelling prokaryote gene content
Matthew Spencer1, Edward Susko1, and Andrew J. Roger2
1Department of Mathematics and Statistics, Dalhousie University, Halifax, Nova Scotia, Canada. 2Department of Biochemistry and Molecular Biology, Dalhousie University, Halifax, Nova Scotia, Canada.
Abstract: The patchy distribution of genes across the prokaryotes may be caused by multiple gene losses or lateral transfer. Probabilistic models of gene gain and loss are needed to distinguish between these possibilities. Existing models allow only single genes to be gained and lost, despite the empirical evidence for multi-gene events. We compare birth-death models (currently the only widely-used models, in which only one gene can be gained or lost at a time) to blocks models (allowing gain and loss of multiple genes within a family). We analyze two pairs of genomes: two E. coli strains, and the distantly-related Archaeoglobus fulgidus (archaea) and Bacillus subtilis (gram positive bacteria). Blocks models describe the data much better than birth-death models. Our models suggest that lateral transfers of multiple genes from the same family are rare (although transfers of single genes are probably common). For both pairs, the estimated median time that a gene will remain in the genome is not much greater than the time separating the common ancestors of the archaea and bacteria. Deep phylogenetic reconstruction from sequence data will therefore depend on choosing genes likely to remain in the genome for a long time. Phylogenies based on the blocks model are more biologically plausible than phylogenies based on the birth-death model.
Readers of this also read:
- Environmental Quality, Developmental Plasticity and the Thrifty Phenotype: A Review of Evolutionary Models
- Identification of Conflicting Selective Effects on Highly Expressed Genes
- Assessing the Applicability of the GTR Nucleotide Substitution Model Through Simulations
- Topological Bias in Distance-Based Phylogenetic Methods: Problems with Over- and Underestimated Genetic Distances
- Evolution, Bioinformatics and Evolutionary Bioinformatics Online