Evolutionary Bioinformatics

Fast Structural Search in Phylogenetic Databases

Authors: Jason T. L. Wang, Huiyuan Shan, Dennis Shasha and William H. Piel

Publication Date: 20 Feb 2007

Evolutionary Bioinformatics Online 2005:1 37-46

Jason T. L. Wang¹, Huiyuan Shan², Dennis Shasha³ and William H. Piel⁴

¹Department of Computer Science, New Jersey Institute of Technology, University Heights, Newark, NJ, USA; ²Department of Computer Science, New Jersey Institute of Technology, University Heights, Newark, NJ, USA; ³Courant Institute of Mathematical Sciences, New York University, New York, NY, USA; ⁴Department of Biological Sciences, State University of New York at Buffalo, Buffalo, NY, USA.

Abstract: As the size of phylogenetic databases grows, the need for efficiently searching these databases arises. Thanks to previous and ongoing research, searching by attribute value and by text has become commonplace in these databases. However, searching by topological or physical structure, especially for large databases and especially for approximate matches, is still an art. We propose structural search techniques that, given a query or pattern tree P and a database of phylogenies D, find trees in D that are sufficiently close to P . The “closeness” is a measure of the topological relationships in P that are found to be the same or similar in a tree D in D. We develop a filtering technique that accelerates searches and present algorithms for rooted and unrooted trees where the trees can be weighted or unweighted. Experimental results on comparing the similarity measure with existing tree metrics and on evaluating the efficiency of the search techniques demonstrate that the proposed approach is promising.

Categories: Bioinformatics , Evolutionary bioinformatics , Research methodologies , Phylogenetics

Keywords: Structural pattern matching, structural search and retrieval, tree search strategies, phylogenetic trees.

Download this full text open access article

(420.71 KB PDF format)

Send to Endnote

Readers of this also read: