Universal Features for the Classification of Coding and Non-Coding DNA Sequences

Nicolas Carels; Ramon Vidal; Diego Frías

Universal Features for the Classification of Coding and Non-Coding DNA Sequences

Submit a Paper

Download PDF

Other Downloads

Authors: Nicolas Carels, Ramon Vidal and Diego Frías

Publication Date: 03 Jun 2009

Type: Original Research

Journal: Bioinformatics and Biology Insights

Citation: Bioinformatics and Biology Insights 2009:3 37-49

doi: 10.4137/BBI.S2236

5,578 Article Views

Article Metrics

Abstract and Sharing
Related Articles
Article Metrics
Discuss

Abstract

In this report, we revisited simple features that allow the classification of coding sequences (CDS) from non-coding DNA. The spectrum of codon usage of our sequence sample is large and suggests that these features are universal. The features that we investigated combine (i) the stop codon distribution, (ii) the product of purine probabilities in the three positions of nucleotide triplets, (iii) the product of Cytosine, Guanine, Adenine probabilities in 1st, 2nd, 3rd position of triplets, respectively, (iv) the product of G and C probabilities in 1st and 2nd position of triplets. These features are a natural consequence of the physico-chemical properties of proteins and their combination is successful in classifying CDS and non-coding DNA (introns) with a success rate >95% above 350 bp. The coding strand and coding frame are implicitly deduced when the sequences are classified as coding.

Downloads

PDF (928.11 KB PDF FORMAT)

RIS citation (ENDNOTE, REFERENCE MANAGER, PROCITE, REFWORKS)

BibTex citation (BIBDESK, LATEX)

XML

PMC HTML