Close
Help




JOURNAL

Cancer Informatics

Prediction of Early Breast Cancer Metastasis from DNA Microarray Data Using High-Dimensional Cox Regression Models

Submit a Paper


Cancer Informatics 2015:Suppl. 2 129-138

Original Research

Published on 05 May 2015

DOI: 10.4137/CIN.S17284


Further metadata provided in PDF



Sign up for email alerts to receive notifications of new articles published in Cancer Informatics

Abstract

Background: DNA microarray studies identified gene expression signatures predictive of metastatic relapse in early breast cancer. Standard feature selection procedures applied to reduce the set of predictive genes did not take into account the correlation between genes. In this paper, we studied the performances of three high-dimensional regression methods – CoxBoost, LASSO (Least Absolute Shrinkage and Selection Operator), and Elastic net – to identify prognostic signatures in patients with early breast cancer.

Methods: We analyzed three public retrospective datasets, including a total of 384 patients with axillary lymph node-negative breast cancer. The Amsterdam van't Veer's training set of 78 patients was used to determine the optimal gene sets and classifiers using sensitivity thresholds resulting in misclassification of no more than 10% of the poor-prognosis group. To ensure the comparability between different methods, an automatic selection procedure was used to determine the number of genes included in each model. The van de Vijver's and Desmedt's datasets were used as validation sets to evaluate separately the prognostic performances of our classifiers. The results were compared to the original Amsterdam 70-gene classifier.

Results: The automatic selection procedure reduced the number of predictive genes up to a minimum of six genes. In the two validation sets, the three models (Elastic net, LASSO, and CoxBoost) led to the definition of genomic classifiers predicting the 5-year metastatic status with similar performances, with respective 59, 56, and 54% accuracy, 83, 75, and 83% sensitivity, and 53, 52, and 48% specificity in the Desmedt's dataset. In comparison, the Amsterdam 70-gene signature showed 45% accuracy, 97% sensitivity, and 34% specificity. The gene overlap and the classification concordance between the three classifiers were high. All the classifiers added significant prognostic information to that provided by the traditional prognostic factors and showed a very high overlap with respect to gene ontologies (GOs) associated with genes overexpressed in the predicted poor-prognosis vs. good-prognosis classes and centred on cell proliferation. Interestingly, all classifiers reported high sensitivity to predict the 4-year status of metastatic disease.

Conclusions: High-dimensional regression methods are attractive in prognostic studies because finding a small subset of genes may facilitate the transfer to the clinic, and also because they strengthen the robustness of the model by limiting the selection of false-positive predictive genes. With only six genes, the CoxBoost classifier predicted the 4-year status of metastatic disease with 93% sensitivity. Selecting a few genes related to ontologies other than cell proliferation might further improve the overall sensitivity performance.



Downloads

PDF  (744.48 KB PDF FORMAT)

RIS citation   (ENDNOTE, REFERENCE MANAGER, PROCITE, REFWORKS)

Supplementary Files 1  (58.11 KB ZIP FORMAT)

BibTex citation   (BIBDESK, LATEX)

XML

PMC HTML


Sharing


What Your Colleagues Say About Cancer Informatics
Publishing in Cancer Informatics was the fastest publication I have ever experienced and has received the highest viewing rate.  So it is a great place to publish your very latest research.
Dr Yue Zhang (Boston, MA, USA)
More Testimonials

Quick Links


New article and journal news notification services
Email Alerts RSS Feeds
Facebook Google+ Twitter
Pinterest Tumblr YouTube