A New Test Statistic Based on Shrunken Sample Variance for Identifying Differentially Expressed Genes in Small Microarray Experiments
Akihiro Hirakawa1, Yasunori Sato2, Chikuma Hamada3 and Isao Yoshimura3
1Genetics Division, National Cancer Center Research Institute, Chuo-ku, Tokyo, Japan. 2Department of Biostatistics, Harvard School of Public Health, Boston, U.S.A. 3Faculty of Engineering, Tokyo University of Science, Shinjuku-ku, Tokyo, Japan.
Abstract
Choosing an appropriate statistic and precisely evaluating the false discovery rate (FDR) are both essential for devising an effective method for identifying differentially expressed genes in microarray data. The t-type score proposed by Pan et al. (2003) succeeded in suppressing false positives by controlling the underestimation of variance but left the overestimation uncontrolled. For controlling the overestimation, we devised a new test statistic (variance stabilized t-type score) by placing shrunken sample variances of the James-Stein type in the denominator of the t-type score. Since the relative superiority of the mean and median FDRs was unclear in the widely adopted Significance Analysis of Microarrays (SAM), we conducted simulation studies to examine the performance of the variance stabilized t-type score and the characteristics of the two FDRs. The variance stabilized t-type score was generally better than or at least as good as the t-type score, irrespective of the sample size and proportion of differentially expressed genes. In terms of accuracy, the median FDR was superior to the mean FDR when the proportion of differentially expressed genes was large. The variance stabilized t-type score with the median FDR was applied to actual colorectal cancer data and yielded a reasonable result.
Readers of this also read:
- Normalization and Gene p-Value Estimation: Issues in Microarray Data Processing
- Evidence of Highly Regulated Genes (in-Hubs) in Gene Networks of Saccharomyces Cerevisiae
- Unsupervised Meta-Analysis on Diverse Gene Expression Datasets Allows Insight into Gene Function and Regulation
- Mammoth and Elephant Phylogenetic Relationships: Mammut Americanum, the Missing Outgroup
- Ontologies for Bioinformatics