Publication Date: 14 Oct 2010
Type: Short Report
Journal: Cancer Informatics
doi: 10.4137/CIN.S5613
Background: While trying to integrate multiple data sets collected by different researchers, we noticed that the sample names were frequently entered inconsistently. Most of the variations appeared to involve punctuation, white space, or their absence, at the juncture between alphabetic and numeric portions of the cell line name.
Results: Reasoning that the variant names could be described in terms of mutations or deletions of character strings, we implemented a simple version of the Needleman-Wunsch global sequence alignment algorithm and applied it to the cell line names. All correct matches were found by this procedure. Incorrect matches only occurred when a cell line was present in one data set but not in the other. The raw match scores tended to be substantially worse for the incorrect matches.
Conclusions: A simple application of the Needleman-Wunsch global sequence alignment algorithm provides a useful first pass at matching sample names from different data sets.
PDF (435.58 KB PDF FORMAT)
RIS citation (ENDNOTE, REFERENCE MANAGER, PROCITE, REFWORKS)
BibTex citation (BIBDESK, LATEX)
PMC HTML
Compared with other journals we considered for publishing, Cancer Informatics provided extremely rapid but quality turnaround from draft submission to a flawlessly typeset final publication. Moreover, sharing the article is now as easy as sharing a link with no subscriptions required, and additional code and data files are equally accessible, supporting reproducible research. Because it has published many of our references we feel confident that our target readership must follow the journal. This is further ...
All authors are surveyed after their articles are published. Authors are asked to rate their experience in a variety of areas, and their responses help us to monitor our performance. Presented here are their responses in some key areas. No 'poor' or 'very poor' responses were received; these are represented in the 'other' category.See Our Results
Copyright © 2013 Libertas Academica Ltd (except open access articles and accompanying metadata and supplementary files.)
FacebookGoogle+Twitter
PinterestTumblrYouTube