Using n-Grams for Syndromic Surveillance in a Turkish Emergency Department Without English Translation: A Feasibility Study

Sylvia Hal�sz; Philip Brown; Cem Oktay; Arif Alper �evik; Isa Kili�aslan; Colin Goodall; Dennis G Cochrane; Thomas R Fowler; Guy Jacobson; Simon Tse; John R Allegra

JOURNAL

Biomedical Informatics Insights

Using n-Grams for Syndromic Surveillance in a Turkish Emergency Department Without English Translation: A Feasibility Study

Sylvia Halász, Philip Brown, Cem Oktay, Arif Alper Çevik, Isa Kiliçaslan, Colin Goodall, Dennis G Cochrane, Thomas R Fowler, Guy Jacobson, Simon Tse and John R Allegra

Biomedical Informatics Insights 2013:6 29-33

Original Research

Published on 25 Apr 2013

DOI: 10.4137/BII.S11334

Further metadata provided in PDF

Download Article PDF

Sign up for email alerts to receive notifications of new articles published in Biomedical Informatics Insights

Abstract and Sharing
Article Metrics
Discuss

Abstract

Introduction: Syndromic surveillance is designed for early detection of disease outbreaks. An important data source for syndromic surveillance is free-text chief complaints (CCs), which are generally recorded in the local language. For automated syndromic surveillance, CCs must be classified into predefined syndromic categories. The n-gram classifier is created by using text fragments to measure associations between chief complaints (CC) and a syndromic grouping of ICD codes.

Objectives: The objective was to create a Turkish n-gram CC classifier for the respiratory syndrome and then compare daily volumes between the n-gram CC classifier and a respiratory ICD-10 code grouping on a test set of data.

Methods: The design was a feasibility Study based on retrospective cohort data. The setting was a university hospital emergency department (ED) in Turkey. Included were all ED visits in the 2002 database of this hospital. Two of the authors created a respiratory grouping of International Classification of Diseases, 10th Revision ICD-10-CM codes by consensus, chosen to be similar to a standard respiratory (RESP) grouping of ICD codes created by the Electronic Surveillance System for Early Notification of Community-based Epidemics (ESSENCE), a project of the Centers for Disease Control and Prevention. An n-gram method adapted from AT&T Labs' technologies was applied to the first 10 months of data as a training set to create a Turkish CC RESP classifier. The classifier was then tested on the subsequent 2 months of visits to generate a time series graph and determine the correlation with daily volumes measured by the CC classifier versus the RESP ICD-10 grouping.

Results: The Turkish ED database contained 30,157 visits. The correlation (R2) of n-gram versus ICD-10 for the test set was 0.78.

Conclusion: The n-gram method automatically created a CC RESP classifier of the Turkish CCs that performed similarly to the ICD-10 RESP grouping. The n-gram technique has the advantage of systematic, consistent, and rapid deployment as well as language independence.

Downloads

PDF (599.59 KB PDF FORMAT)

RIS citation (ENDNOTE, REFERENCE MANAGER, PROCITE, REFWORKS)

BibTex citation (BIBDESK, LATEX)

XML

PMC HTML

What Your Colleagues Say About Biomedical Informatics Insights

It's a great experience publishing with Biomedical Informatics Insights. I am particularly impressed with the in-depth and constructive comments provided by the reviewers within such a short time-frame. The typesetting was not only prompt, but most importantly, effective. In fact, this was among the very few publication experiences that I have had when no correction was needed in the author proofs. I highly recommend Biomedical Informatics Insights to both readers and prospective ...

Dr Chun Hsi Huang (Computer Science and Engineering, University of Connecticut)

More Testimonials