Publication Date: 09 Dec 2014
Type: Review
Journal: Cancer Informatics
Citation: Cancer Informatics 2014:Suppl. 1 145-158
doi: 10.4137/CIN.S13875
Support vector machines (SVMs) are widely employed in molecular diagnosis of disease for their efficiency and robustness. However, there is no previous research to analyze their overfitting in high-dimensional omics data based disease diagnosis, which is essential to avoid deceptive diagnostic results and enhance clinical decision making. In this work, we comprehensively investigate this problem from both theoretical and practical standpoints to unveil the special characteristics of SVM overfitting. We found that disease diagnosis under an SVM classifier would inevitably encounter overfitting under a Gaussian kernel because of the large data variations generated from high-throughput profiling technologies. Furthermore, we propose a novel sparse-coding kernel approach to overcome SVM overfitting in disease diagnosis. Unlike traditional ad-hoc parametric tuning approaches, it not only robustly conquers the overfitting problem, but also achieves good diagnostic accuracy. To our knowledge, it is the first rigorous method proposed to overcome SVM overfitting. Finally, we propose a novel biomarker discovery algorithm: Gene-Switch-Marker (GSM) to capture meaningful biomarkers by taking advantage of SVM overfitting on single genes.
PDF (2.11 MB PDF FORMAT)
RIS citation (ENDNOTE, REFERENCE MANAGER, PROCITE, REFWORKS)
BibTex citation (BIBDESK, LATEX)
PMC HTML
This is the first time for us to submit a manuscript to Cancer Informatics. We thank the peer reviewers for their insightful comments, which have improved our manuscript markedly. We were pleased to find that the staff were extremely helpful and kept us informed of the progress of the submission step-by-step. Our experience with Cancer Informatics has been tremendous. Thank you very much!
Facebook Google+ Twitter
Pinterest Tumblr YouTube