Grouped False-Discovery Rate for Removing the Gene-Set-Level Bias of RNA-seq

Tae Young Yang; Seongmun Jeong

JOURNAL

Evolutionary Bioinformatics

425,216 Journal Article Views | Journal Analytics

Grouped False-Discovery Rate for Removing the Gene-Set-Level Bias of RNA-seq

Submit a Paper

Download PDF

Other Downloads

Authors: Tae Young Yang and Seongmun Jeong

Publication Date: 13 Nov 2013

Type: Methodology

Journal: Evolutionary Bioinformatics

Citation: Evolutionary Bioinformatics 2013:9 467-478

doi: 10.4137/EBO.S13099

943 Article Views

Article Metrics

Abstract and Sharing
Article Metrics
Discuss

Abstract

In recent years, RNA-seq has become a very competitive alternative to microarrays. In RNA-seq experiments, the expected read count for a gene is proportional to its expression level multiplied by its transcript length. Even when two genes are expressed at the same level, differences in length will yield differing numbers of total reads. The characteristics of these RNA-seq experiments create a gene-level bias such that the proportion of significantly differentially expressed genes increases with the transcript length, whereas such bias is not present in microarray data. Gene-set analysis seeks to identify the gene sets that are enriched in the list of the identified significant genes. In the gene-set analysis of RNA-seq, the gene-level bias subsequently yields the gene-set-level bias that a gene set with genes of long length will be more likely to show up as enriched than will a gene set with genes of shorter length. Because gene expression is not related to its transcript length, any gene set containing long genes is not of biologically greater interest than gene sets with shorter genes. Accordingly the gene-set-level bias should be removed to accurately calculate the statistical significance of each gene-set enrichment in the RNA-seq.

We present a new gene set analysis method of RNA-seq, called FDRseq, which can accurately calculate the statistical significance of a gene-set enrichment score by the grouped false-discovery rate. Numerical examples indicated that FDRseq is appropriate for controlling the transcript length bias in the gene-set analysis of RNA-seq data. To implement FDRseq, we developed the R program, which can be downloaded at no cost from http://home.mju.ac.kr/home/index.action?siteId=tyang.

Downloads

PDF (766.08 KB PDF FORMAT)

RIS citation (ENDNOTE, REFERENCE MANAGER, PROCITE, REFWORKS)

BibTex citation (BIBDESK, LATEX)

XML

PMC HTML

External Resources

FDRseq

What Your Colleagues Say About Evolutionary Bioinformatics

I found the submission management system for Evolutionary Bioinformatics to be one of the most user-friendly around. The peer review was very rigorous and constructive. Support staff were polite and furnished accurate information almost instantly. I strongly recommend other scientists to consider this journal.

Dr Madhav P. Nepal (South Dakota State University, Brookings, SD, USA)

More Testimonials