Publication Date: 23 Sep 2013
Type: Original Research
Journal: Cancer Informatics
Citation: Cancer Informatics 2013:12 193-201
doi: 10.4137/CIN.S12862
High-dimensional datasets can be confounded by variation from technical sources, such as batches. Undetected batch effects can have severe consequences for the validity of a study’s conclusion(s). We evaluate high-throughput RNAseq and miRNAseq as well as DNA methylation and gene expression microarray datasets, mainly from the Cancer Genome Atlas (TCGA) project, in respect to technical and biological annotations. We observe technical bias in these datasets and discuss corrective interventions. We then suggest a general procedure to control study design, detect technical bias using linear regression of principal components, correct for batch effects, and re-evaluate principal components. This procedure is implemented in the R package swamp, and as graphical user interface software. In conclusion, high-throughput platforms that generate continuous measurements are sensitive to various forms of technical bias. For such data, monitoring of technical variation is an important analysis step.
PDF (1.58 MB PDF FORMAT)
RIS citation (ENDNOTE, REFERENCE MANAGER, PROCITE, REFWORKS)
BibTex citation (BIBDESK, LATEX)
PMC HTML
Publishing in Cancer Informatics was the fastest publication I have ever experienced and has received the highest viewing rate. So it is a great place to publish your very latest research.
All authors are surveyed after their articles are published. Authors are asked to rate their experience in a variety of areas, and their responses help us to monitor our performance. Presented here are their responses in some key areas. No 'poor' or 'very poor' responses were received; these are represented in the 'other' category.See Our Results
Copyright © 2014 Libertas Academica Ltd (except open access articles and accompanying metadata and supplementary files.)
Facebook Google+ Twitter
Pinterest Tumblr YouTube