Publication Date: 06 May 2013
Type: Original Research
Journal: Evolutionary Bioinformatics
Citation: Evolutionary Bioinformatics 2013:9 185-202
doi: 10.4137/EBO.S11609
We propose a novel method for the task of protein subfamily identification; that is, finding subgroups of functionally closely related sequences within a protein family. In line with phylogenomic analysis, the method first builds a hierarchical tree using as input a multiple alignment of the protein sequences, then uses a post-pruning procedure to extract clusters from the tree. Differently from existing methods, it constructs the hierarchical tree top-down, rather than bottom-up and associates particular mutations with each division into subclusters. The motivating hypothesis for this method is that it may yield a better tree topology with more accurate subfamily identification as a result and additionally indicates functionally important sites and allows for easy classification of new proteins. A thorough experimental evaluation confirms the hypothesis. The novel method yields more accurate clusters and a better tree topology than the state-of-the-art method SCI-PHY, identifies known functional sites, and identifies mutations that alone allow for classifying new sequences with an accuracy approaching that of hidden Markov models.
PDF (855.43 KB PDF FORMAT)
RIS citation (ENDNOTE, REFERENCE MANAGER, PROCITE, REFWORKS)
Supplementary Files 1 (2.86 MB ZIP FORMAT)
BibTex citation (BIBDESK, LATEX)
PMC HTML
It was a nice experience for me to publish my first paper in Evolutionary Bioinformatics. The peer review process was fast, critical, helpful and fair. The production process was also fast and accurate. Thanks for your hard work.
Facebook Google+ Twitter
Pinterest Tumblr YouTube