Top-Down Clustering for Protein Subfamily Identification

Eduardo P. Costa; Celine Vens; Hendrik Blockeel

JOURNAL

Evolutionary Bioinformatics

385,580 Journal Article Views | Journal Analytics

Top-Down Clustering for Protein Subfamily Identification

Submit a Paper

Download PDF

Other Downloads

Authors: Eduardo P. Costa, Celine Vens and Hendrik Blockeel

Publication Date: 06 May 2013

Type: Original Research

Journal: Evolutionary Bioinformatics

Citation: Evolutionary Bioinformatics 2013:9 185-202

doi: 10.4137/EBO.S11609

1,546 Article Views

Article Metrics

Abstract and Sharing
Article Metrics
Discuss

Abstract

We propose a novel method for the task of protein subfamily identification; that is, finding subgroups of functionally closely related sequences within a protein family. In line with phylogenomic analysis, the method first builds a hierarchical tree using as input a multiple alignment of the protein sequences, then uses a post-pruning procedure to extract clusters from the tree. Differently from existing methods, it constructs the hierarchical tree top-down, rather than bottom-up and associates particular mutations with each division into subclusters. The motivating hypothesis for this method is that it may yield a better tree topology with more accurate subfamily identification as a result and additionally indicates functionally important sites and allows for easy classification of new proteins. A thorough experimental evaluation confirms the hypothesis. The novel method yields more accurate clusters and a better tree topology than the state-of-the-art method SCI-PHY, identifies known functional sites, and identifies mutations that alone allow for classifying new sequences with an accuracy approaching that of hidden Markov models.

Downloads

PDF (855.43 KB PDF FORMAT)

RIS citation (ENDNOTE, REFERENCE MANAGER, PROCITE, REFWORKS)

Supplementary Files 1 (2.86 MB ZIP FORMAT)

BibTex citation (BIBDESK, LATEX)

XML

PMC HTML

What Your Colleagues Say About Evolutionary Bioinformatics

It was a nice experience for me to publish my first paper in Evolutionary Bioinformatics. The peer review process was fast, critical, helpful and fair. The production process was also fast and accurate. Thanks for your hard work.

Dr Kangquan Yin (Peking University, Beijing, PRC)

More Testimonials