Close
Help
Need Help?





JOURNAL

Genomics Insights

31,927 Journal Article Views | Journal Analytics

Single Nucleotide Polymorphisms Caused by Assembly Errors

Submit a Paper



Publication Date: 04 Feb 2010

Type: Original Research

Journal: Genomics Insights

Citation: Genomics Insights 2010:3 1-8

doi: 10.4137/GEI.S3653

Abstract

We compare the results of three different assembler programs, Celera, Phrap and Mira2, for the same set of about a hundred thousand Sanger reads derived from an unknown bacterial genome. In difference to previous assembly comparisons we do not focus on speed of computation and numbers of assembled contigs but on how the different sequence assemblies agree by content. Threefold consistently assembled genome regions are identified in order to estimate a lower bound of erroneously identified single nucleotide polymorphisms (SNP) caused by nothing but the process of mathematical sequence assembly. We identified 509 sequence triplets common to all three de-novo assemblies spanning only 34% (3.3 Mb) of the bacterial genome with 175 of these regions (∼1.5 Mb) including erroneous SNPs and insertion/deletions. Within these triplets this on average leads to one error per 7,155 base pairs. Replacing the assembler Mira2 by the most recent version Mira3, the letter number even drops to 5,923. Our results therefore suggest that a considerably high number of erroneous SNPs may be present in current sequence data and mathematicians should urgently take up research on numerical stability of sequence assembly algorithms. Furthermore, even the latest versions of currently used assemblers produce erroneous SNPs that depend on the order reads are used as input. Such errors will severely hamper molecular diagnostics as well as relating genome variation and disease. This issue needs to be addressed urgently as the field is moving fast into clinical applications.


Downloads

PDF  (1.95 MB PDF FORMAT)

RIS citation   (ENDNOTE, REFERENCE MANAGER, PROCITE, REFWORKS)

BibTex citation   (BIBDESK, LATEX)


Sharing




What Your Colleagues Say About Genomics Insights
Recently we published a paper describing cloning of a new kinase gene, MLK4, in Genomics Insights. I was impressed by the prompt processing and  speed of publication.  The comments from the reviewers allowed me to improve the paper significantly.  The reviews were scientifically deep and objective, which is very valuable because in many journals decisions to publish or not to publish are very unfair and subjective. I highly recommend that other ...
Dr Eugene R. Zabarovsky (Karolinska Institute, Stockholm, Sweden)
More Testimonials

Quick Links


New article and journal news notification services
Email Alerts RSS Feeds
Facebook Google+ Twitter
Pinterest Tumblr YouTube