Close
Help


Variability in RNA Quantification Methods: Featured Author on Manuel X. Duval

Posted Wed, Jun, 12,2013

A study implemented by Manuel X. Duval and Victoria Y Wong investigates to what extent RNA measurements change with regards to the perpetration such as labeling. Their recently published Genomics Insights article, Inter-Laboratory Variability in Array-Based RNA Quantification Methods, theorise that where the assay was performed (ie, in which laboratory) and/or which method of RNA labeling was used represents unexpected co-factors that actually mask the source of biological variability. In this weeks featured author, Dr Duval answers questions on his background, his research and his Genomics Insights manuscript.

How did you become interested in studying inter-laboratory variability in array-based RNA qualification methods?

As an undergraduate student in Doctor Dominique Job’s lab in Marseille France, I studied the fine biochemical events specifying the kinetic of DNA-dependant RNA polymerization.  DNA is the media of biological data.  RNA provides the means to implement the information coded in the DNA into an actual physical object.  As mentioned by Pr. Richard Young, “much of biological regulation occurs at the level of transcription initiation” (Holstege et al., 1998 in “Dissecting the Regulatory Circuitry of a Eukaryotic genome”).  Being able to accurately and reproducibly identify which transcript has its quantity altered w.r.t a given perturbation brought in to a biological system delivers a great deal of knowledge about the molecular network deployed by the said biological system in order for it to maintain its integrity and/or to grow. I am therefore a big proponent of projects aimed at addressing biological questions by means of interrogation RNA content.

On the other hand, I spent years as a molecular biologists dealing with the actual extraction and handling of RNA from various sources.  This hands-on exposure  made me very well aware of the fact that while RNA content features a considerable amount of information it is also very easy to be misled by confounding factors. This is particularly true for the so-called area of transcriptomics, but not exclusively.  Even DNA can undergo post extraction modifications, and this issue has to be taken very seriously as well, especially in measurements whose scope is to uncover rare DNA variants.

Biology is gradually operating like its scientific sibling disciplines, e.g. physics.  Before the advent of technologies that allow massively parallel measurements of biomolecules, we, biologists, spent the vast majority of our time to acquire the data: it was low throughput and usually semi-quantitative.  And we spent the remaining time to analyse it.  Also, the data producer and analyser were usually the same investigator. Now we are turning into a situation similar to physics, where theoretical physicists handle the vast amount of data generated by experimental physicists.  I am acutely aware of the gap between experimentalists and analysts in biology and in genomics in particular.  This gap needs to be closed in order for the genomics research to deliver according to the expectations and the investment.  In summary, I encountered, in my recent years as data analyst, data sets that were not worth to infer confident conclusion.  Last but not the least, the extraordinary study review performed by Drs Baggerly and Coombes (for reference, see: Baggerly and Coombes, 2009.  “Deriving Chemosensitivity from Cell Lines: Forensic Bioinformatics and Reproducible Research in High-throughput Biology”. Annals of Applied Statistics 3(4):1309-1334, 2009) greatly influenced my decision to be pro-active in the area of reproducibility and communicate anything that could be helpful to my colleagues in improving their own research outcome.  

What was previously known about array based RNA quantification methods?  How has your work in this area advanced understanding of it?

The technical variability in RNA measurements is I believe widely acknowledged but is not sufficiently documented. The main concern has been on the DNA array reagents: it has been shown that the nature of the probe used to measure a given RNA contributes to the RNA level read-out.  The US FDA led MicroArray Quality Control (MAQC) study group has released reports in the scope of comparing the output of various DNA array platforms using the same source of RNA.   (For reference, see The MicroArray Quality Control (MAQC)-II study of common practices for the development and validation of microarray-based predictive models Nature Biotechnology 28, 827–838 (2010) doi:10.1038/nbt.1665).  In 2005, Irizarry and colleagues have reported some interesting results with technical duplicates.  Their data set was much smaller than ours and the RNA was I believe labelled according to the same procedure. (for reference, see Irizarry et al. 2005 I Nature Methods).  More recently, Hansen and Irizarry have noticed a biased toward GC-rich templates in RNA-seq data sets and have come-up with a normalization procedure that attempts to correct it.  (For reference, see Hansen and Irizarry, Biostatistics. 2012 April; 13(2): 204–216. Removing technical variability in RNA-seq data using conditional quantile normalization).  

The amount of variability caused by the method and protocol applied to the RNA extraction and labelling in hybridization-based measurement has not been sufficiently considered yet.    I would not have predicted it.  I was genuinely worried by our observations.  I believe any one relying on such data sets in the scope of inferring biological information should be equally concerned. There are ways to improve the amount of biological information both by means of experimental and data analysing approaches.  But the 1st step is evidently is to be aware of the magnitude of the issue.   

What do regard as being the most important aspect of the results of your research?

Very provocatively, this communication might suggest that a bulk of the NCBI GEO and EBI Array Express database content could be dropped?

This is obviously a blunt and well excessive assessment of course.

Thanks to the annotation of the GEO and Array Express data set, researchers are able to extract bona-fide biological content in GEO data sets by relying on the data set annotations.  This is a smart way to rescue some info from these transciptomics data sets.  (for reference see Dudley et al., 2009 in Molecular Systems Biology, “Disease signatures are robust across tissues and experiments”).  

The main point is to document what are the main sources of confounding variability. And to become cautious w.r.t conclusions one could draw from a single RNA quantification assay data set.  In the case of this communication, if we had had the results from one lab only, we would have run the QC shown in Figure 4 of the article.  All the QC results were conclusive. Then one of us would have run a pair-wise comparison between 2 conditions, extracted the GeneChip identifier whose read-out values significantly differed and used this outcome to query the Gene Ontology database via an implementation of a Gene Set Enrichment algorithm (e.g. the Onto-Compare resource, available at http://vortex.cs.wayne.edu/projects.htm#Onto-Compare).  Given the experimental perturbations studied, we would have had some expectations w.r.t. to which gene ontology categories would be biased in our treated samples vs. controls.  E.g. if you expose biological systems to some excessive heat, you will expect to trigger a heat-shock response.  If your Gene Ontology query return such enrichments, the temptation is to conclude that the whole gene list that constituted the query are composed of genes which are all involved in the heat shock response.  Such statements are common in studies involving the high throughput measurement of RNA.  The worry is that such published statements are in some instances used to populate content in genomic knowledge-base resources.  

Replicating RNA measurement multiple times with various technical replicates seem an expensive proposition in the short term.  In the long run, this is evidently the cost effective solution since the delivery of such assays is information, which is used to build hypothesis for downstream investigations.  If you have it right in the 1st place, i.e. if you minimize the amount of false positives, you significantly increase the values of the read-out, for your own-research but also for the whole genomics community.

What was the greatest difficulty you encountered in studying inter-laboratory variability in array based quantification methods?

There was not.  The data set was of such good quality w.r.t each five laboratories that delivered it that the analysis was actually straightforward.  

To read about Dr Duvals work please see his wordpress. To learn more about Dr Duvals graduate programme Bioinformatics Certificate at the University of New Haven please see the course information . The paper Inter-Laboratory Variability in Array-Based RNA Quantification Methods is available to download, comment on and share.

share on

Posted in: Authors

  • Efficient Processing: 4 Weeks Average to First Editorial Decision
  • Fair & Independent Expert Peer Review
  • High Visibility & Extensive Database Coverage
Services for Authors
What Your Colleagues Say About Libertas Academica
The editors were extremely helpful and prompt in responding to questions and issues related to the submission. The online submission was easy and quick. The whole process from submission to publication  was very satisfying and expeditious.
Dr Chao Huang (University of Kansas, Veteran's Administration Medical Center, Kansas City, MO, USA)
More Testimonials

Quick Links


New article and journal news notification services
Email Alerts RSS Feeds
Facebook Google+ Twitter
Pinterest Tumblr YouTube