RODS PUBLICATIONS
FULL CITATIONS AND SUMMARIES
Refereed Journals
Wagner MM, Espino JU. Tsui F-C, Gesteland P, Chapman WW, Ivanov O, Moore AW, Wong WK, Dowling J, Hutman J. Syndrome and Outbreak Detection from Chief Complaint Data: Experience of the Real-Time Outbreak and Disease Surveillance Project. Presented at the National Syndromic Surveillance Conference, New York Academy of Medicine, New York, October 20-24, 2003. Submitted by invitation to the CDC’s Morbidity and Mortality Weekly Report 53 (Supplement 1) 28-31, 2004. [PDF] or see http://www.cdc.gov/mmwr/preview/mmwrhtml/su5301a7.htm
Objectives: A goal of the RODS project was to create a regional testbed that collects and analyzes--for at least one type of data—a health status indicator that might provide early warning of disease outbreaks, especially for the detection of large cohort exposures such as one resulting from an aerosolized Anthrax release. A second goal was to study detectability of outbreaks using temporal and spatial statistical analysis of such data,
Methods: In the United States, chief complaints of patients registering for service in emergency rooms are recorded electronically in registration computer systems as part of normal workflow. Registration computer then immediately transmits chief complaint data in the form of HL7 messages to other computers in the health system via an HL7 message router. HL7 message routers de-identify these messages and forward the de-identified messages in real time via the Internet to a center serving the region. At the center, a Bayesian classifier assigns automatically each chief complaint to one of eight syndromic categories. The category data are analyzed by time series algorithms and also presented to users using displays like time lines and maps. Several experiments were conducted to validate these data.
Results: The project demonstrated feasibility of such data collection at the greater than 50% level in large-scale deployments in Pennsylvania, Utah, and Ohio. Experiments demonstrated that the Bayesian classifier can discriminate between different syndromic presentations and that the performance of the classifier was adequate to support accurate and timely detection of large seasonal outbreaks of disease.
Conclusions: Monitoring of chief complaints is highly feasible and early evidence suggests that there is signal that can be exploited through temporal-spatial analysis to detect large outbreaks, (or smaller outbreaks that are more spatially confined). Future directions for research include additional experiments to better understand the limits of detectability from this type of data as well as experiments using additional types of data (e.g., temperatures) to increase the accuracy of classification and to enable classification of patients into more specific syndromes.
Espino JU, Wagner MM, Szczepaniak MC, Tsui F-C, Su H, Olszewski R, Liu Z, Zeng X, Ma L, Lu Z, Dara J. Removing a Barrier to Computer-Based Outbreak and Disease Surveillance: The RODS Open Source Project. Presented at the National Syndromic Surveillance Conference, New York Academy of Medicine, New York, October 20-24, 2003. CDC’s Morbidity and Mortality Weekly Report 53 (Supplement 1) 32-39, 2004. [PDF] or see http://www.cdc.gov/mmwr/preview/mmwrhtml/su5301a8.htm
Objectives: The objective of the Real-time Outbreak and Disease Surveillance (RODS) Open Source Project is to accelerate the deployment of computer-based outbreak and disease surveillance systems by (1) writing software, and (2) catalyzing the formation of a community of users, developers, consultants, companies and scientists that support open source public health surveillance software.
Methods: The University of Pittsburgh seeded the open source project by releasing the RODS software under the GNU General Public License. We created an open-source-project infrastructure consisting of a website, mailings lists for the developers and users, selected individuals who serve as lead software developers, and shared code-development tools. These resources are intended to encourage growth of a project community. We use the following metrics to measure the progress of the project: usage of the website, number of software downloads, number of inquiries, number of system deployments, and number of new features or modules added to the code base.
Results: Between September and November 2003, there have been 5,370 page views of the project website, 59 downloads of the software, 20 inquiries, one new deployment, and we have added four new features.
Conclusions: Our three-month experience has been that health departments and companies are more interested at present in using the software “as is” than in customization and/or the development of new features. Such interest satisfies our primary goal of accelerating the deployment of such systems. We expect that after the initial effort of installation is completed, these health departments and companies will start to customize the software and contribute these enhancements to the public code base.
Wagner MM, Tsui F-C, Espino JU, Hogan WR, Hutman J, Hersh J, Neill D, Moore AW, Parks G, Lewis C, Aller R. The National Retail Data Monitor for Public Health Surveillance. Presented at the National Syndromic Surveillance Conference, New York Academy of Medicine, New York, October 20-24, 2003. CDC’s Morbidity and Mortality Weekly Report 53 (Supplement 1) 40-42, 2004. [PDF] or see http://www.cdc.gov/mmwr/preview/mmwrhtml/su5301a9.htm
Objectives: The objectives of the NRDM are to collect 70% of all sales data for selected OTC healthcare products in near real time and to redistribute them to public health officials in raw or analyzed form.
Methods: Retailers transmit daily sales data encoded using Universal Product Bar Codes (“UPC codes”) to the NRDM server facility. The UPC-level data are then aggregated into 18 analytic categories and analyzed by spatial and temporal algorithms to detect anomalous levels of sales that might indicate a disease outbreak. The data are also redistributed in raw form to those health departments that request the data.
Results: At present, the NRDM receives data from more than 15,000 stores and has an additional 4,000 stores under agreement representing an estimated 40% market share nationally (with higher coverage in large metropolitan areas). The data are received with one-day latency or less from time of sale. The NRDM has created more than 295 accounts for users in 41 states. Eight jurisdictions receive raw data aggregated to the zip-code level. Future plans include continued recruitment of retailers, reduction of time latency, deployment in other countries, broadening of scope to include prescription antibiotics and other selected prescription medications, research use of the data, and identifying a permanent source of funding.
Conclusions: The NRDM is one of the first examples of a national data utility for public health surveillance--collecting, redistributing, and analyzing daily sales volume data of selected healthcare products. A single national level data-utility reduces the effort for both the data providers and health departments.
Methods: We
reviewed multiple sources of information to create a practical description
of existing surveillance systems and how they interact to detect
outbreaks. The results of this study were summarized in a system diagram.
To understand how the elements interact, we examined a sample of recent
outbreaks to determine how they were detected with reference to the system
diagram.
Results:
The system diagram for the de facto U.S. outbreak detection system
consists of five components: the clinical health care system, local/state
health agencies, federal agencies, academic/professional organizations and
collaborating governmental jurisdictions. Primary data collection occurs
at the clinical health care systems, and local health agencies. The
outbreaks studied showed all five components aggregating, analyzing and
sharing data.
Conclusions:
The current U.S. approach to detection of disease outbreaks is complex and
involves many organizations interacting in loosely coupled manner.
State and local health departments and the heath care system are major
components in detection of outbreaks.
Wong W-K, Moore
AW, Cooper GF, Wagner, MM. WSARE: What's Strange About Recent
Events. Journal of Urban Health: Bulletin of the New York Academy of
Medicine 80/2 (Supplement 1) 66-75, 2003. [PDF]
Wagner MM, Robinson JM, Tsui F-C, Espino JU, Hogan WR. Design of a
National Retail Data Monitor for Public Health Surveillance. Journal of
the American Medical Informatics
Association 10/5 (Sept/Oct) 409-418, 2003.
Summary: The National Retail Data Monitor receives data daily from
10,000 stores, including pharmacies that sell health care products. These
stores belong to national chains that process sales data centrally and
utilize Universal Product Codes and scanners to collect sales information
at the cash register. The high degree of retail sales automation enables
the monitor to collect information from thousands of store locations in
near to real time for use in public health surveillance. The monitor
provides user interfaces that display summary sales data on timelines and
maps. Algorithms monitor the data automatically on a daily basis to
detect unusual patters of sales. The project provides the resulting data
and analyses, free of charge, to health departments nationwide. Future
plans include continued enrollment and support of health departments,
developing methods to make the service financially self-supporting, and
further refinement of the data collection system to reduce the time
latency of data receipt and analysis.
Tsui F-C, Espino JU,
Dato VM, Gesteland PH, Hutman J, Wagner
MM. Technical Description of RODS: A Real-time Public Health Surveillance System. Journal of the American Medical Informatics Association
10/5 (Sept/Oct) 399-408,
2003.
Summary:
This paper describes the design and implementation of the Real-time
Outbreak and Disease Surveillance (RODS) system, a computer-based public
health surveillance system for early detection of disease outbreaks.
Hospitals send data from clinical encounters over virtual private networks
and leased lines using the Health Level 7 (HL7) message protocol.
The data are sent in real time. RODS automatically classifies the
registration chief complaint from the visit into one of seven syndrome
categories using Bayesian classifiers. It stores the data in a
relational database; aggregates the data for analysis using data
warehousing techniques; applies univariate and multivariate statistical
detection algorithms to the data, and alerts users of when the algorithms
identify anomalous patterns in the syndrome counts. RODS also has a
Web-based user interface that supports temporal and spatial analyses.
RODS processes sales of over-the-counter healthcare products in a similar
manner, but receives such data in batch mode on a daily basis.
RODS was used
during the Winter 2002 Olympics and currently operates in two states--
Pennsylvania and Utah. It has been and continues to be a resource
for implementing, evaluating, and applying new methods of public health
surveillance.
Chapman WW, Cooper GF,
Hanbury P, Chapman BE, Harrison LH, Wagner MM. Creating A Text
Classifier to Detect Radiology Reports Describing Mediastinal Findings
Associated with Inhalational Anthrax and Other Disorders.
Journal of the American Medical Informatics Association
10/5 (Sept/Oct) 494-503,
2003.
Summary:
Objective -- Create a classifier for automatic detection of chest
radiograph reports consistent with the mediastinal findings of
inhalational anthrax.
Design -- we used the IPS
system to create a keyword classifier for detecting reports describing
mediastinal findings consistent with anthrax and compared their
performance on a test set of 79,032 chest radiograph reports.
Measurements - Area under the
ROC curve was the main outcome measure of the IPS classifier. We
calculated sensitivity and specificity of an initial IPS model based on an
existing keyword search and compared it against a Boolean version of the
IPS classifier.
Results -- The IPS classifier
received an area under the ROC curve of 0.677 (90% CI = 0.628-0.772) with
a specificity of 0.99 and maximum sensitivity of 0.35. The initial
IPS model attained a specificity of 0.1 and a sensitivity of 0.04.
Conclusion -- The IPS system
is a useful tool for helping domain experts create a statistical keyword
classifier for textual reports that is a potentially useful component in
surveillance of radiographic findings suspicious for anthrax.
Gesteland PH, Gardner RM,
Tsui F-C, Espino JU, Rolfs RT, James BC, Wagner MM. Implementing
Automated Syndromic Surveillance for the 2002 Winter Olympics.
Journal of the American Medical Informatics Association 10/6 (Nov/Dec):
547-554, 2003.
or
http://www.jamia.org/cgi/reprint/M1352v1
Summary: The key to minimizing the
effects of an intentionally caused disease outbreak is early detection.
This fact has prompted development of surveillance systems that use
real-time data to track syndrome counts within a region. We successfully
implemented such a computer-based, automated syndromic surveillance system
in Utah in just six weeks. Using free-text chief complaints captured in
emergency departments and walk-in clinics, we achieved an approximate 70%
coverage of acute-care visits in a seven-county region with a population
of 1.8 million. The organizational issues outweighed the technical
issues. The motivation and cooperation inspired by the 2002 Winter
Olympics was a powerful driver in overcoming the organizational issues.
This paper aims to inform developers of public health surveillance systems
of our experience and suggest a framework for how these systems can be
deployed.
Hogan WR, Tsui F-C, Ivanov
I, Gesteland P, Grannis S, Overhage JM, Robinson JM, Wagner MM. Detection of Pediatric Respiratory and Diarrheal
Outbreaks from Sales of Over-the-Counter Electrolyte Products.
Journal of the American Medical Informatics Association 10/6 (Nov/Dec):
555-562, 2003.
Using the
exponentially weighted moving average (EWMA) control chart method, we
measured the earliness, sensitivity, and specificity of OTC electrolyte
product sales for predicting epidemics of pediatric illness. The dates of
onset of the epidemics of pediatric illness were determined using EWMA as
well.
We found that OTC
electrolyte sales in pediatric patients correlate well with hospital
admissions of children for infectious diseases and precede admissions by
2.4 weeks (95% C.I. 0.1-4.8)
We conclude that
OTC electrolyte sales are an early indicator of epidemics of disease in
children.
Wagner MM, Dato VM, Dowling
JN, Allswede M. Representative Threats for Research in Public Health
Surveillance. Journal of Biomedical Informatics 36: 177-188, 2003.
http://authors.elsevier.com/sd/article/S1532046403000650
Summary: The
number of biological agents that could be used by terrorists exceeds 100.
The size of this ‘problem space’ is a challenge for researchers, and the
numbers of ways that each agent can be used increases the size of the
problem space even further.
In this paper, we address this problem by creating a parsimonious
characterization of the problem space. We cluster the threats into nine
categories based on their similarity as problems in detection. We also
identify one or more threats in each category that have occurred in recent
times and could be used by researchers as surrogates for non-occurrent
diseases.
We suggest that researchers consider our categories as a Criterion Set for
analysis and evaluation of detection systems. The categories characterize
the problem space in a tractable manner with less loss of generality than
analyses based on one or two selected diseases, which is representative of
current analyses.
Wagner MM. Models of Computer-Based
Outbreak Detection. Reference Librarian
http://www.haworthpress.com/store/product.asp?sku=J120.
Summary: Despite intense development of computer-based surveillance
(CBS), there has been little work on models of such systems. This paper
describes a computational model of CBS: Output, inputs, and algorithms
required to create the output from the inputs. Starting from an assumed
requirement that an ideal CBS system be capable of mitigating a
bioterroristic outbreak of inhalational Anthrax, it is shown that an
output of CBS should be a continuously updated probabilistic threat
assessment, which is used to trigger response actions at levels of threat
determined by cost-benefit analyses. The paper considers the types of
algorithmic transformations required to map from a full spectrum of types
of input data to the required output. The set of input data is much
broader than previously considered. The specific complementary
relationship between the computational model and the evolving
architectural model of the National Electronic Disease Surveillance System
(NEDSS) is discussed.
Mandl KD, Overhage JM,
Wagner MM, Lober WB, Sebastiani P, Mostashari F, Pavlin JA, Gesteland PH,
Treadwell T, Koski E, Hutwagner L, Buckeridge DL, Aller RD, Grannis S.
Implementing Symdromic Surveillance: A Practical Guide Informed by the
Early Experience. Journal of the American Medical Informatics Association, 2003
(In press).
Summary: Syndromic
surveillance refers to methods relying on detection of clinical case
features that are discernable before confirmed diagnoses are made.
In particular, prior to the laboratory confirmation of an infectious
disease, ill persons may exhibit behavioral patterns, symptoms, signs, or
laboratory findings that can be tracked through a variety of data sources.
Syndromic surveillance systems are being developed locally, regionally,
and nationally. The efforts have been largely directed at
facilitating the early detection of a covert bioterrorist attacks, but the
technology may also be useful for general public health, clinical
medicine, quality improvement, patient safety, and research. This
paper, authored by developers and methodologists involved in the design
and deployment of the first wave of syndromic surveillance systems, is
intended to serve as a guide for informaticians, public health managers,
and practitioners who are currently planning deployment of such systems in
their regions.
Panackal AA,
M’Ikanatha NM, Tsui F-C, McMahon J, Wagner MM, Dixon BW, Zubieta J, Phelan
M, Mirza S, Morgan J, Jernigan D, Pasculle AW, Rankin JT Jr, Hajjeh RA,
Harrison LH. Automatic Electronic Laboratory. Based Reporting of Notifiable Infectious Diseases at a Large Health System. Emerging
Infectious Diseases 8/7: 685-691, 2002.
http://www.cdc.gov/ncidod/EID/vol8no7/01-0493-G3.htm
Summary: Electronic
laboratory-based reporting, developed by the UPMC Health System,
Pittsburgh, Pennsylvania, was evaluated to determine if it could be
integrated into the conventional paper-based reporting system. We
reviewed reports of ten infectious diseases from eight UPMC hospitals
that reported to the Allegheny County Health Department in southwestern
Pennsylvania during January 1-November 26, 2000. Electronic reports were
received a median of 4 days earlier than conventional reports. The
completeness of reporting was 74% (95% confidence interval [Cl] 66% to
81%) for the electronic laboratory-based reporting and 65% (95% Cl 57% to
73%) for the conventional paper-based reporting system (p>0.05). Most
reports (88%) missed by electronic laboratory-based reporting were caused
by using free text. Automatic reporting was more rapid and as complete as
conventional reporting. Using standardized coding and minimizing free
text usage will increase the completeness of electronic laboratory-based
reporting.
Teich
JM,
Wagner MM, Mackenzie CF, Schafer KO. The informatics response in disaster,
terrorism, and war. Journal
of the American Medical Informatics Association
9:97-104, 2002.
Summary:
The United States currently faces several new, concurrent
large-scale health crises as a result of terrorist activity.
In particular, three major health issues have risen sharply in
immediacy and public consciousness: bioterrorism, the threat of widespread
delivery of agents of illness; mass disasters, local events that produce
large numbers of casualties and overwhelm the usual capacity of the
healthcare delivery system; and the problem of delivering optimal health
care to remote military field sites. Each of these health issues carries
large demands for collection, analysis, coordination, and distribution of
health information.
In this article, we present overviews and ongoing work efforts in
each area.
Lober WB, Karras BT, Wagner MM, Overhage JM, Davidson
AJ, Fraser H, Mandl KD, Espino JU, Tsui F-C.
Roundtable on bioterrorism detection: Information system-based
surveillance. Journal of the
American Medical Informatics Association 9:105-115, 2002.
Summary:
During the
2001 AMIA Annual Symposium, the Anesthesia, Critical Care, and Emergency
Medicine Working Group hosted a Roundtable on Bioterrorism (BT) Detection.
Sixty-four people attended the Roundtable, during which public
health surveillance systems designed to enhance early detection of BT
events were discussed by several researchers. These systems make secondary
use of existing clinical, laboratory, paramedical and pharmacy data, or
facilitate electronic case reporting by clinicians. This paper combines
case reports of 6 existing systems with discussion of some common
techniques and approaches.
The Roundtable’s goal was to foster communication among
researchers and promote progress by (1)
sharing
information about systems, including origins, current capabilities,
stages of deployment, and architectures, (2)
sharing
lessons learned while developing and implementing systems, and (3)
exploring
cooperation between projects, including sharing of software and
data.
A listserve for this effort can be found at
http://bt.cirg.washington.edu
.
Wagner MM. The Space Race and
Biodefense: Lessons From NASA About Big Science and the Role of Medical
Informatics. Journal of the American Medical Informatics Association
9:120-122, 2002.
http://www.jamia.org/
Summary: The events that
followed the launch of Sputnik on October 4, 1957 provide a metaphor for
the events that followed the October 4, 2001 announcement by health
officials of a case of pulmonary anthrax in the U.S.
This paper uses that metaphor to elucidate the nature of the task
ahead, and to suggest questions such as:
Can the goals of the biodefense effort be formulated as concisely
and concretely as the goal of NASA? Can we measure success in biodefense
as we did for the space project? and Who are equivalent of rocket
engineers for Biodefense?
Tsui, F-C, Wagner, M.M.,
Dato V.M., Chang, C.H. Value of ICD-9-Coded Chief Complaints for Detection
of Epidemics. Journal of the American Medical Informatics Association,
Special Issue on Enabling Patient Safety Through Informatics (Selected
Works from the 2001 AMIA Annual Symposium and Educational Curricula for
Enhancing Safety Awareness) 9/6 (Nov/Dec): S41-S47, 2002. Summary: To assess the value of ICD-9-coded
chief complaints for early detection of epidemics, the sensitivity,
positive predictive value, and timeliness of Influenza detection using a
respiratory set (RS) of ICD-9 codes and an Influenza set (IS) were
measured. Timeliness using the cross-correlation function was also
measured. For Influenza epidemics occurring in Pittsburgh between December
1999 and December 2000, the detection had a sensitivity of 100% (3/3
epidemics) and positive predictive values of 75% (3/4) for IS and 100%
(3/3) for RS. The inherent timeliness of the data shown by the cross
correlation function was good; however, the timeliness of detection was
poor.
Wong W-K,
Moore AM, Cooper GF, Wagner MM. Rule-based Anomaly Pattern Detection for
Detecting Disease Outbreaks. In: Proceedings of The Eighteenth National
Conference on Artificial Intelligence (AAAI-02), Fourteenth Innovative
Applications of AI Conference (IAAI-02) (held in Edmonton, Alberta,
Canada, July 28-August 1, 2002). American Association of Artificial
Intelligence (AAAI) Press/The MIT Press, Menlo Park, CA, pp. 217-223,
2002. http://www.autonlab.org/pap.html#wsare
[PDF]
Summary: This paper presents an algorithm for performing early
detection of disease outbreaks by searching a database of Emergency
Department cases for anomalous patterns. Traditional techniques for
anomaly detection are unsatisfactory for this particular problem because
they identify individual data points that are rare due to their particular
combination of features. When applied to our scenario, these traditional
algorithms discover isolated outliers of particularly strange events, such
as someone accidentally shooting their ear, that are not indicative of a
new disease outbreak. Instead, we would like to detect anomalous patterns.
These patterns are groups with specific characteristics, such as elderly
males living in a particular neighbourhood, whose recent proportions are
anomalous based on what their normal proportions should be. We propose
using a rule-based anomaly detection algorithm that characterizes each
anomalous pattern with a rule. The significance of each rule is carefully
evaluated using Fisher’s Exact Test and a randomization test. The
performance of our algorithm is compared against the standard algorithm by
measuring the number of false positives and the timeliness of detection.
Simulated data is used in the evaluation phase. This data was produced by
a simulator that simulates the effects of a disease outbreak on a city.
The results indicate that our algorithm has significantly better detection
times for common significance thresholds while having a slightly higher
false positive rate.
Wagner MM, Tsui F-C, Espino JU, Dato
VM, Sittig DF, Caruana RA, McGinnis LF, Deerfield DW, Druzdzel MJ, Fridsma
DB. The emerging science of very early detection of disease outbreaks.
Journal of Public Health Management and Practice 7(6): 51-59, 2001.
http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&list_uids=11710168&dopt=Abstract Summary: The surge of development of new
public health surveillance systems designed to provide more timely
detection of outbreaks suggests that public health has a new requirement:
extreme timeliness of detection. The authors review previous work relevant
to measuring timeliness and to defining timeliness requirements. Using
signal detection theory and decision theory, the authors identify
strategies to improve timeliness of detection and position ongoing system
development within that framework.
Yasnoff WA, Overhage JM, Humphreys BL, LaVenture M, Goodman KW, Gatewood
L, Ross DA, Reid J, Hammond WE, Dwyer D, Huff SM, Gotham I, Kukafka R,
Loonsk JW, Wagner MM. A national agenda
for public health informatics. Journal of Public Health Management
and Practice 2001
Nov;7(6):1-21.
http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&list_uids=11713752&dopt=Abstract
The American
Medical Informatics Association 2001 Spring Congress brought together the
public health and informatics communities to develop a national agenda for
public health informatics. Discussions on funding and governance;
architecture and infrastructure; standards and vocabulary; research,
evaluation, and best practices; privacy, confidentiality, and security;
and training and workforce resulted in 74 recommendations with two key
themes: (1) all stakeholders need to be engaged in coordinated activities
related to public health information architecture, standards,
confidentiality, best practices, and research and (2) informatics training
is needed throughout the public health workforce. Implementation of this
consensus agenda will help promote progress in the application of
information technology to improve public health.
Book Chapters
Wagner MM, Espino JU, Tsui F-C, Aryel RM. Public Health Surveillance:
The Role of Clinical Information Systems. In: Ball
MJ, Weaver CA,
Kiel
JM
(eds.) Healthcare Information Management Systems, Cases, Strategies, and
Solution, Chapter 39. Health Informatics Series, 3rd Ed. Springer-Verlag, pp. 513-531, 2004.
[PDF]
Summary:
Recently, health departments have begun
collecting data from
hospitals in near real time for public health
surveillance. In New York City, Boston, and Washington, for example,
hospitals send daily reports of visits to the health department.
This new trend is driven by the need for early warning of surreptitious
biological attacks. The trend is likely to accelerate because
evidence is accumulating that these new approaches work—that they can
detect outbreaks earlier than existing methods and even identify outbreaks
that have heretofore gone unnoticed.
This trend has
important implications for researchers and developers in clinical
informatics: It is creating new design requirements for clinical
information systems.
This chapter is
written for both the public health reader as well as clinical informatics
reader. For clinical informaticians, it provides a primer on public
health surveillance, drawing on examples from our experience with the RODS
system, which is a regional public health surveillance system. For public
health workers, it provides a primer on clinical information systems, also
drawing on examples from our experience with the Health System Resident
Component (HSRC), a hospital-based component similar to a clinical event
monitor.
Refereed Conference Proceedings
Olszewski RT. Bayesian Classification of Triage Diagnoses for the Early
Detection of Epidemics. In: Recent Advances in Artificial Intelligence:
Proceedings of the Sixteenth International Florida Artificial Intelligence
Research Society (FLAIRS) Conference (held in St. Augustine, Florida, May
12-14, 2003), AAAI Press, Menlo Park, California, pp. 412-416, 2003.
[PDF]
S
Espino JU, Hogan WR, Wagner MM. Telephone
Triage: A Timely Data Source for Surveillance of Influenza-like Diseases.
Proceedings of the Annual Fall Symposium of the American Medical
Informatics Association, Omni Press CD, pp. 215-219, 2003.
[PDF]
Summary: We evaluated telephone triage (TT) data for public health early
warning systems. TT data is electronically available and contains coded
elements that include the demographics and description of a caller’s
medical complaints. In the study, we obtained emergency room TT data and
after hours TT data from a commercial TT software and service company. We
compared the timeliness of the TT data with influenza surveillance data
from the Centers for Disease Control using the cross correlation
function. Emergency room TT calls are one to five weeks ahead of
surveillance data collected by the CDC.
Ivanov O, Gesteland PH, Hogan WR, Mundorff MB, Wagner MM. Detection of
Pediatric Respiratory and Gastrointestinal Outbreaks from Free-Text Chief
Complaints. Proceedings of the Annual Fall Symposium of the
American Medical Informatics Association, Omni Press CD, pp. 318-322, 2003.
[PDF]
Summary: We conducted a retrospective study to ascertain the
potential of free-text chief complaints collected in pediatric emergency
departments to serve as surveillance data for early detection of
outbreaks.
We determined that automatically coded chief complaint data
provide a signal that reflects outbreaks in a population of children less
than five years of age. Using the Exponentially Weighted Moving Average (EWMA)
detection algorithm, we measured the timeliness, sensitivity, and
specificity of free-text chief complaints for predicting outbreaks of
pediatric respiratory and gastrointestinal illness.
We found that time series of automatically coded free text-chief
complaints in pediatric patients correlate well with hospital admissions
and preceded them by the mean of 10.3 days (95% CI -15.15, 35.5) for
respiratory outbreaks and 29 days (95%, CI 4.23, 53.7) for
gastrointestinal outbreaks.
We conclude that free-text chief complaints may play an
important role as an early, sensitive, and specific indicator of outbreaks
of respiratory and gastrointestinal illness in children less than five
years of age.
Zhang J, Tsui F-C, Wagner MM, Hogan WR. Detection of Outbreaks from Time
Series Data Using Wavelet Transform. Proceedings of the Annual Fall
Symposium of the American Medical Informatics Association, Omni Press CD,
pp. 748-752, 2003.
[PDF]
Summary: In this paper, we developed a new approach to detection of
disease outbreaks based on wavelet transform. It is capable of dealing
with two problems found in real-world time series data, namely, negative
singularity and long-term trends, which may degrade the performance of
current approaches to outbreak detection. To test this approach, we
introduced simulated disease outbreaks and negative singularities into a
real world dataset and applied it and two other algorithms—autoregressive
(AR) and Multi-resolution Wavelet Auto-regressive (MWAR) — to this
dataset. We compared the performance of these algorithms in terms of
sensitivity, specificity and timeliness. The results showed that our
approach had similar sensitivity and specificity and slightly better
timeliness compared to the other two algorithms initially, but when we
introduced negative singularities, its performance did not degrade as
significantly as the other two algorithms' performance. We conclude that
our approach to detection, when compared to traditional approaches, is not
as susceptible to degradation of performance caused by negative
singularities.
Ma L, Tsui F-C,
Summary: We developed a Bayesian network model for detecting
pulmonary tuberculosis in hospitalized patients. The network was
constructed using objective probabilities from the literature and
subjective expert knowledge. The network comprised 19 variables, whose
values we extracted either automatically or manually from information in
electronic medical records.
We retrospectively evaluated the ability of the network to detect patients
with undiagnosed pulmonary tuberculosis at the time of admission. We
assigned variables to one of four categories depending on level of
difficulty of automatic extraction from our clinical information systems:
coded, coded with some free text, free text in radiological reports and
free text in admission reports. We also determined whether the
information in the clinical information systems was available within three
days of admission or later. We measured the networks performance with
increasing amounts of data ranging from just coded data available within
three days of admission to all available information at any time during
the admission.
The best performance (AUC=0.954, 95% CI 0.905, 0.999) was achieved using
all available data. The poorest performance (AUC=0.676; 95% CI 0.535,
0.817) was for Types I, and II (coded + microbiological lab data)
available within the first three days of hospital stay. We found that
adding radiological data to coded and micro lab data was significant (p =
0.006) in first three days of hospital stay and in entire hospital stay (p
= 0.031) for areas under the curves. We also found that using only data
from radiology reports available during the first three days of
hospitalization, AUC is 0.794 (95% CI: 0.688, 0.901), and using data from
any radiology report AUC is 0.906 (95% CI: 0.839, 0.973).
We conclude that
the Bayesian network is a promising method for tuberculosis cases
detection, that radiographic findings are important to detection
performance, and that either coding of radiographic findings or natural
language processing is necessary for timely and accurate detection.
Gesteland PH, Wagner MM, Chapman WW, Espino JU, Tsui F-T, Gardner RM,
Rolfs RT, Dato VM, James BC, Haug PJ. Rapid Deployment of an Electronic
Disease Surveillance System in the State of Utah for the 2002 Olympic
Winter Games. Journal of the American Medical Informatics Association,
Supplement issue on the Proceedings of the Annual Fall Symposium of the
American Medical Informatics Association Biomedical Informatics: One
Discipline (November 9-13, 2002, San Antonio, Texas), Kohane IS (ed),
Hanley & Belfus, Inc., Philadelphia, Pennsylvania, pp. 285-289, 2002.
[PDF]
Summary: The key to minimizing the effects of an intentionally caused
disease outbreak is early detection of the attack and rapid identification
of the affected individuals. The Bush administration’s leadership in
advocating for biosurveillance systems capable of monitoring for
bioterrorism attacks suggests that we should move quickly to establish a
nationwide early warning biosurveillance system as a defense against this
threat. The spirit of collaboration and unity inspired by the events of
9-11 and the 2002 Olympic Winter Games in
Salt Lake City
provided the opportunity to demonstrate how a prototypic biosurveillance
system could be rapidly deployed. In seven weeks we were able to
implement an automated, real-time disease outbreak detection system in the
State of Utah and monitored 80,684 acute care visits occurring during a
28-day period spanning the Olympics. No trends of immediate public health
concern were identified.
Ivanov O, Wagner MM, Chapman WW, Olszewski RT. Accuracy of Three
Classifiers of Acute Gastrointestinal Syndrome for Syndromic
Surveillance. Journal of the American Medical Informatics Association,
Supplement issue on the Proceedings of the Annual Fall Symposium of the
American Medical Informatics Association Bio+medical Informatics: One
Discipline (November 9-13, 2002, San Antonio, Texas), Kohane IS (ed),
Hanley & Belfus, Inc., Philadelphia, Pennsylvania, pp. 345-349, 2002.
[PDF]
Summary: ICD-9–coded emergency department (ED) diagnoses and free-text
triage diagnoses are routinely collected data elements that have potential
value for public health surveillance and early detection of epidemics.
We constructed and evaluated three classifiers for the detection of cases
of acute gastrointestinal syndrome of public health significance: one used
ICD-9–coded ED diagnosis as input data; the other two used free-text
triage diagnosis. We measured the performance of these classifiers against
the expert classification of cases based on review of ED reports. The
sensitivity of the ICD-9–code classifier was 0.32, and the specificity was
0.99. The sensitivity of a naïve Bayes classifier using triage diagnoses
was 0.63, the specificity was 0.94, and the area under the ROC curve was
0.82. A bigram Bayes classifier had sensitivity 0.38, specificity 0.94,
and area under the ROC of 0.69.
We conclude that a naive Bayes classifier of free-text triage diagnosis
data provides more sensitive and earlier detection of cases of acute
gastrointestinal syndrome than either a bigram Bayes classifier or an
ICD-9 code classifier. The sensitivity achieved should be sufficient for
syndromic surveillance system designed to detect moderate to large
epidemics.
Tsui F-C, Espino JU, Wagner MM, Gesteland PH, Ivanov O, Olszweski RT, Liu
Z, Zeng X, Chapman WW, Wong W-K, Moore AW. Data, Network, and
Application: Technical Description of the
Utah
RODS Winter Olympic Biosurveillance System. Journal of the American
Medical Informatics Association, Supplement issue on the Proceedings of
the Annual Fall Symposium of the American Medical Informatics Association
Bio+medical Informatics: One Discipline” (November 9-13, 2002, San
Antonio, Texas), Kohane IS (ed), Hanley & Belfus, Inc., Philadelphia,
Pennsylvania, pp. 815-819, 2002.
[PDF]
Summary: Given the
post September 11th climate of possible bioterrorist attacks
and the high profile 2002 Winter Olympics in the
Salt Lake City,
Utah,
we challenged ourselves to deploy a computer-based real-time automated
biosurveillance system for
Utah,
the Utah Real-time Outbreak and Disease Surveillance system (Utah RODS),
in six weeks using our existing Real-time Outbreak and Disease
Surveillance (RODS) architecture. During the Olympics, Utah RODS received
real-time HL-7 admission messages from 10 emergency departments and 20
walk-in clinics. It collected free-text chief complaints, categorized
them into one of seven prodromes classes using natural language
processing, and provided a web interface for real-time display of time
series graphs, geographic information system output, outbreak algorithm
alerts, and details of the cases. The system detected two possible
outbreaks that were dismissed as the natural result of increasing rates of
Influenza. Utah RODS allowed us to further understand the complexities
underlying the rapid deployment of a RODS-like system.
Tsui, F-C, Wagner, M.M., Dato V.M., Chang, C.H. Value of ICD-9-Coded
Chief Complaints for Detection of Epidemics. Journal of the American
Medical Informatics Association, Supplement issue on the Proceedings of
the Annual Fall Symposium of the American Medical Informatics Association
(November 4-7, 2001, Washington, D.C.), Bakken S (ed), Hanley & Belfus,
Inc., Philadelphia, Pennsylvania, pp. 711-715, 2001.
[PDF]
(Full paper published in the Journal of the American Medical
Informatics Association Special Issue on Enabling Patient Safety Through
Informatics: Selected Works from the 2001 AMIA Annual Symposium and
Educational Curricula for Enhancing Safety Awareness 9/6 (Nov/Dec),
S41-S47, 2002. See above.)
Summary: To assess the value of ICD-9-coded chief complaints for early
detection of epidemics, the sensitivity, positive predictive value, and
timeliness of Influenza detection using a respiratory set (RS) of ICD-9
codes and an Influenza set (IS) were measured. Timeliness using the
cross-correlation function was also measured. For Influenza epidemics
occurring in Pittsburgh between December 1999 and December 2000, the
detection had a sensitivity of 100% (3/3 epidemics) and positive
predictive values of 75% (3/4) for IS and 100% (3/3) for RS. The inherent
timeliness of the data shown by the cross correlation function was good;
however, the timeliness of detection was poor.
Zeng X, Wagner, M.M. Modeling the Effects of Epidemics on Routinely
Collected Data. Journal of the American Medical Informatics Association,
Supplement issue on the Proceedings of the Annual Fall Symposium of the
American Medical Informatics Association (November 4-7, 2001, Washington,
D.C.), Bakken S (ed), Hanley & Belfus, Inc., Philadelphia, Pennsylvania,
pp. 781-785, 2001.
[PDF]
Summary: The use of routinely collected data, such as absenteeism, to
provide an early warning of an epidemic will depend on better
understanding of the effects of epidemics on such data. We reviewed
studies in behavioral medicine and health psychology in order to build a
model relating known factors related to human health information and
treatment seeking behavior and effects on routinely collected data. This
review and modeling effort may be useful to researchers in early
detection, simulation, and response policy analysis.
Espino, J.U., Wagner, M.M. Accuracy of ICD-9-coded chief complaints
and diagnoses for the detection of acute respiratory illness. Journal of
the American Medical Informatics Association, Supplement issue on the
Proceedings of the Annual Fall Symposium of the American Medical
Informatics Association (November 4-7, 2001, Washington, D.C.), Bakken S
(ed), Hanley & Belfus, Inc., Philadelphia, Pennsylvania, pp. 164-168,
2001.
[PDF]
Summary: ICD-9-coded chief complaints and diagnoses are a routinely
collected source of data with potential for use in public health
surveillance. We constructed two detectors of acute respiratory illness:
one based on ICD-9-coded chief complaints and one based on ICD-9-coded
diagnoses. We measured the classification performance of these detectors
against the human classification of cases based on review of emergency
department reports. Using ICD-9-coded chief complaints, the sensitivity of
detection of acute respiratory illness was 0.44 and its specificity was
0.97. The sensitivity and specificity using ICD-9-coded diagnoses were no
different. These properties of excellent specificity and moderate
sensitivity, coupled with the earliness and electronic availability of
such data, suggest that detectors based on ICD-9 coding of emergency
department chief complaints have a role in public health surveillance.
Chapman, W.W., Bridewell, W., Hanbury, P., Cooper, G.F., and
Buchanan, B.G. Evaluation of Negation Phrases in Narrative Clinical
Reports. Journal of the American Medical Informatics Association,
Supplement issue on the Proceedings of the Annual Fall Symposium of the
American Medical Informatics Association (November 4-7, 2001, Washington,
D.C.), Bakken S (ed), Hanley & Belfus, Inc., Philadelphia, Pennsylvania,
pp. 105-109, 2001. (Nominated for Best Paper Award.)
[PDF]
Summary: Objective was to evaluate the use of negation phrases and the
frequency of negation in free-text clinical reports. A simple negation
algorithm was applied to ten types of clinical reports (n=42,160) dictated
during July 2000. How often each of 66 negation phrases was used to mark a
clinical observation as absent were counted. A random sample of 400
sentences were read by physicians, and precision was calculated for the
negation phrases. What proportion of clinical observations were marked as
absent were measured. Sixty negation phrases were triggered by the
negation algorithm, but just seven of them accounted for 90% of the
negations. The negation phrases received an overall precision of 97%, with
"not" earning the lowest precision of 63%. Between 39% and 83% of all
clinical observations were identified as absent by the negation algorithm,
depending on the type of report analyzed. The most frequently used
clinical observations were negated the majority of the time.
Wagner MM, Tsui F-C, Pike J, Pike L. Design of a
clinical notification system. In: Lorenzi NM (ed), Proceedings of the
1999 Annual Fall Symposium of the American Medical Informatics
Association, “Transforming Health Care Through Informatics: Cornerstones
for a New Information Management Paradigm” (Washington, D.C., November
6-10, 1999), Hanley & Belfus, Inc., Philadelphia, Pennsylvania, pp.
989-993, 1999.
[PDF]
Summary: We describe the requirements and design of an enterprise-wide
notification system. From published descriptions of notification schemes,
our own experience, and use cases provided by diverse users in our
institution, we developed a set of functional requirements. The resulting
design supports multiple communication channels, third party mappings
(algorithms) from message to recipient and/or channel of delivery, and
escalation algorithms. A requirement for multiple message formats is
addressed by a document specification. We implemented this system in Java
as a CORBA object. This paper describes the design and current
implementation of our notification system.
Wagner MM. A Review of
Federal Bioterrorism Preparedness Programs: Building an Early Warning
Public Health Surveillance System. Testimony given to the Hearing of the
Oversight and Investigations Subcommittee of the House Committee on Energy
and Commerce, Washington, D.C., November 2, 2001. See
http://energycommerce.house.gov/107/hearings/11012001Hearing
Summary: A problem that current research is addressing is early detection
of an outbreak caused by large-scale aerosol release of Anthrax, where
detection must occur within days of release to allow time for response and
treatment to occur. A product of the research is RODS (Real-time Outbreak
and Disease Surveillance), an early warning system that has been deployed
in Western Pennsylvania for two years. A key feature of RODS is that it receives data
directly and without delay from computers in emergency departments and
hospitals. Early detection is achieved in RODS by identifying patients
early in the disease process when they have nonspecific symptoms such as
cough or diarrhea, and then using brute-force computer power to find any
interesting patterns among the sick individuals that would suggest that an
unusual outbreak is occurring.
Recommendations to the Subcommittee include: (1) Congress should
provide funding directly to all ongoing metropolitan and regional efforts
to build early warning capability, provided they adhere to National
Electronic Disease Surveillance System standards; and (2) Congress should
provide funding for basic and applied research in early warning systems
for biological threats. The funding should be directed to
interdisciplinary Centers of Excellence. A focus of the research must be
identifying better ways to obtain needed data.
Federal funding for this research comes from the National Library of
Medicine, the Agency for Health Research and Quality, the Centers for
Disease Control and Prevention, and the Defense Advanced Research Projects
Agency.