Testing the Accuracy of Eukaryotic Phylogenetic Profiles for Prediction of Biological Function
Saurav Singh and Dennis P. Wall
Center for Biomedical Informatics, Harvard Medical School, Boston, MA 02115.
Abstract
A phylogenetic profile captures the pattern of gene gain and loss throughout evolutionary time. Proteins that interact directly or indirectly within the cell to perform a biological function will often co-evolve, and this co-evolution should be well reflected within their phylogenetic profiles. Thus similar phylogenetic profiles are commonly used for grouping proteins into functional groups. However, it remains unclear how the size and content of the phylogenetic profile impacts the ability to predict function, particularly in Eukaryotes. Here we developed a straightforward approach to address this question by constructing a complete set of phylogenetic profiles for 31 fully sequenced Eukaryotes. Using Gene Ontology as our gold standard, we compared the accuracy of functional predictions made by a comprehensive array of permutations on the complete set of genomes. Our permutations showed that phylogenetic profiles containing between 25 and 31 Eukaryotic genomes performed equally well and significantly better than all other permuted genome sets, with one exception: we uncovered a core of group of 18 genomes that achieved statistically identical accuracy. This core group contained genomes from each branch of the eukaryotic phylogeny, but also contained several groups of closely related organisms, suggesting that a balance between phylogenetic breadth and depth may improve our ability to use Eukaryotic specific phylogenetic profiles for functional annotations.
Readers of this also read: