Close
Help
Need Help?





JOURNAL

Evolutionary Bioinformatics

749,570 Journal Article Views | Journal Analytics

Efficient Feature Selection and Multiclass Classification with Integrated Instance and Model Based Learning

Submit a Paper



Publication Date: 30 Apr 2012

Type: Methodology

Journal: Evolutionary Bioinformatics

Citation: Evolutionary Bioinformatics 2012:8 197-205

doi: 10.4137/EBO.S9407

Abstract

Multiclass classification and feature (variable) selections are commonly encountered in many biological and medical applications. However, extending binary classification approaches to multiclass problems is not trivial. Instance-based methods such as the K nearest neighbor (KNN) can naturally extend to multiclass problems and usually perform well with unbalanced data, but suffer from the curse of dimensionality. Their performance is degraded when applied to high dimensional data. On the other hand, model-based methods such as logistic regression require the decomposition of the multiclass problem into several binary problems with one-vs.-one or one-vs.-rest schemes. Even though they can be applied to high dimensional data with L1 or Lp penalized methods, such approaches can only select independent features and the features selected with different binary problems are usually different. They also produce unbalanced classification problems with one vs. the rest scheme even if the original multiclass problem is balanced.

By combining instance-based and model-based learning, we propose an efficient learning method with integrated KNN and constrained logistic regression (KNNLog) for simultaneous multiclass classification and feature selection. Our proposed method simultaneously minimizes the intra-class distance and maximizes the interclass distance with fewer estimated parameters. It is very efficient for problems with small sample size and unbalanced classes, a case common in many real applications. In addition, our model-based feature selection methods can identify highly correlated features simultaneously avoiding the multiplicity problem due to multiple tests. The proposed method is evaluated with simulation and real data including one unbalanced microRNA dataset for leukemia and one multi-class metagenomic dataset from the Human Microbiome Project (HMP). It performs well with limited computational experiments.


Downloads

PDF  (602.73 KB PDF FORMAT)

RIS citation   (ENDNOTE, REFERENCE MANAGER, PROCITE, REFWORKS)

BibTex citation   (BIBDESK, LATEX)

XML

PMC HTML


Sharing


Our Service Promise

  • Prompt Processing (3 Weeks to Editorial Decision)
  • Fair, Independent Peer Review
  • High Visibility & Extensive Indexing
What Your Colleagues Say About Evolutionary Bioinformatics
This is the fastest progress we have experienced from submission to acceptance.  Reviews are fast, pertinent, and instructive.  Every step of the process is visible and prompt, and every email is friendly and immediate.  In all, it is an excellent experience to be published in Libertas Academica.
Dr Jiang Wang (Sun Yat sen University, Guangzhou, P.R. China.)
More Testimonials

Quick Links




Follow Us We make it easy to find new research papers.
Email Alerts RSS Feeds
Facebook Google+ Twitter
Pinterest Tumblr YouTube




SUBJECT HUBS
Author Survey Results
author_survey_results
All authors are surveyed after their articles are published. Authors are asked to rate their experience in a variety of areas, and their responses help us to monitor our performance. Presented here are their responses in some key areas. No 'poor' or 'very poor' responses were received; these are represented in the 'other' category.
See Our Results