Dual KS: Defining Gene Sets with Tissue Set Enrichment Analysis
Yarong Yang3,4, Eric J. Kort1,2,3, Nader Ebrahimi4, Zhongfa Zhang2 and Bin T. Teh2
1Laboratory of Molecular Epidemiology, Van Andel Research Institute, Grand Rapids, MI. 2Laboratory of Cancer Genetics, Van Andel Research Institute, Grand Rapids, MI. 3These authors contributed equally to this work. 4Division of Statistics, Northern Illinois University, de Kalb, IL.
Abstract
Background: Gene set enrichment analysis (GSEA) is an analytic approach which simultaneously reduces the dimensionality of microarray data and enables ready inference of the biological meaning of observed gene expression patterns. Here we invert the GSEA process to identify class-specific gene signatures. Because our approach uses the Kolmogorov-Smirnov approach both to define class specific signatures and to classify samples using those signatures, we have termed this methodology “Dual-KS” (DKS).
Results: The optimum gene signature identified by the DKS algorithm was smaller than other methods to which it was compared in 5 out of 10 datasets. The estimated error rate of DKS using the optimum gene signature was smaller than the estimated error rate of the random forest method in 4 out of the 10 datasets, and was equivalent in two additional datasets. DKS performance relative to other benchmarked algorithms was similar to its performance relative to random forests.
Conclusions: DKS is an efficient analytic methodology that can identify highly parsimonious gene signatures useful for classification in the context of microarray studies. The algorithm is available as the dualKS package for R as part of the bioconductor project.
Readers of this also read:
- Dual KS: Defining Gene Sets with Tissue Set Enrichment Analysis
- Bimodal Gene Expression and Biomarker Discovery
- A Robust Gene Selection Method for Microarray-based Cancer Classification
- Enhancing Disease Surveillance Event Communication Among Jurisdictions
- BITC Sensitizes Pancreatic Adenocarcinomas to TRAIL-induced Apoptosis