Using Empirically Constructed Lexical Resources for Named Entity Recognition

Siddhartha Jonnalagadda; Trevor Cohen; Stephen Wu; Hongfang Liu; Graciela Gonzalez

JOURNAL

Biomedical Informatics Insights

Using Empirically Constructed Lexical Resources for Named Entity Recognition

Submit a Paper

Siddhartha Jonnalagadda, Trevor Cohen, Stephen Wu, Hongfang Liu and Graciela Gonzalez

Biomedical Informatics Insights 2013:Suppl. 1 17-27

Original Research

Published on 24 Jun 2013

DOI: 10.4137/BII.S11664

Further metadata provided in PDF

Download Article PDF

Sign up for email alerts to receive notifications of new articles published in Biomedical Informatics Insights

Abstract and Sharing
Article Metrics
Discuss

Abstract

Because of privacy concerns and the expense involved in creating an annotated corpus, the existing small-annotated corpora might not have sufficient examples for learning to statistically extract all the named-entities precisely. In this work, we evaluate what value may lie in automatically generated features based on distributional semantics when using machine-learning named entity recognition (NER). The features we generated and experimented with include n-nearest words, support vector machine (SVM)-regions, and term clustering, all of which are considered distributional semantic features. The addition of the n-nearest words feature resulted in a greater increase in F-score than by using a manually constructed lexicon to a baseline system. Although the need for relatively small-annotated corpora for retraining is not obviated, lexicons empirically derived from unannotated text can not only supplement manually created lexicons, but also replace them. This phenomenon is observed in extracting concepts from both biomedical literature and clinical notes.

Downloads

PDF (646.58 KB PDF FORMAT)

RIS citation (ENDNOTE, REFERENCE MANAGER, PROCITE, REFWORKS)

BibTex citation (BIBDESK, LATEX)

XML

PMC HTML

What Your Colleagues Say About Biomedical Informatics Insights

It's a great experience publishing with Biomedical Informatics Insights. I am particularly impressed with the in-depth and constructive comments provided by the reviewers within such a short time-frame. The typesetting was not only prompt, but most importantly, effective. In fact, this was among the very few publication experiences that I have had when no correction was needed in the author proofs. I highly recommend Biomedical Informatics Insights to both readers and prospective ...

Dr Chun Hsi Huang (Computer Science and Engineering, University of Connecticut)

More Testimonials