RODS PUBLICATIONS

FULL CITATIONS AND SUMMARIES

 

Refereed Journals

 

Wagner MM, Espino JU. Tsui F-C, Gesteland P, Chapman WW, Ivanov O, Moore AW, Wong WK, Dowling J, Hutman J.  Syndrome and Outbreak Detection from Chief Complaint Data: Experience of the Real-Time Outbreak and Disease Surveillance Project.  Presented at the National Syndromic Surveillance Conference, New York Academy of Medicine, New York, October 20-24, 2003.  Submitted by invitation to the CDC’s Morbidity and Mortality Weekly Report 53 (Supplement 1) 28-31, 2004. [PDF] or see http://www.cdc.gov/mmwr/preview/mmwrhtml/su5301a7.htm

 

Summary:  Introduction: This paper summarizes the experience of the Real-time Outbreak and Disease Surveillance (RODS) project with collection and analysis of free-text emergency department chief complaints.

     Objectives: A goal of the RODS project was to create a regional testbed that collects and analyzes--for at least one type of data—a health status indicator that might provide early warning of disease outbreaks, especially for the detection of large cohort exposures such as one resulting from an aerosolized Anthrax release.  A second goal was to study detectability of outbreaks using temporal and spatial statistical analysis of such data,

     Methods: In the United States, chief complaints of patients registering for service in emergency rooms are recorded electronically in registration computer systems as part of normal workflow.   Registration computer then immediately transmits chief complaint data in the form of HL7 messages to other computers in the health system via an HL7 message router.  HL7 message routers de-identify these messages and forward the de-identified messages in real time via the Internet to a center serving the region. At the center, a Bayesian classifier assigns automatically each chief complaint to one of eight syndromic categories.  The category data are analyzed by time series algorithms and also presented to users using displays like time lines and maps. Several experiments were conducted to validate these data.

     Results:  The project demonstrated feasibility of such data collection at the greater than 50% level in large-scale deployments in Pennsylvania, Utah, and Ohio.  Experiments demonstrated that the Bayesian classifier can discriminate between different syndromic presentations and that the performance of the classifier was adequate to support accurate and timely detection of large seasonal outbreaks of disease. 

     Conclusions:  Monitoring of chief complaints is highly feasible and early evidence suggests that there is signal that can be exploited through temporal-spatial analysis to detect large outbreaks, (or smaller outbreaks that are more spatially confined).   Future directions for research include additional experiments to better understand the limits of detectability from this type of data as well as experiments using additional types of data (e.g., temperatures) to increase the accuracy of classification and to enable classification of patients into more specific syndromes.

 

Espino JU, Wagner MM, Szczepaniak MC, Tsui F-C, Su H, Olszewski R, Liu Z, Zeng X, Ma L, Lu Z, Dara J.  Removing a Barrier to Computer-Based Outbreak and Disease Surveillance: The RODS Open Source Project.  Presented at the National Syndromic Surveillance Conference, New York Academy of Medicine, New York, October 20-24, 2003.  CDC’s Morbidity and Mortality Weekly Report 53 (Supplement 1) 32-39, 2004.  [PDF] or see http://www.cdc.gov/mmwr/preview/mmwrhtml/su5301a8.htm

 

Summary:  Introduction: A key requirement of computer-based outbreak and disease surveillance is the availability of high quality software that is well-supported and affordable.   The process of software development in an open source framework—which entails free distribution and use of software and community-based continuous software development—can produce software with such characteristics, and can do so rapidly.

     Objectives:  The objective of the Real-time Outbreak and Disease Surveillance (RODS) Open Source Project is to accelerate the deployment of computer-based outbreak and disease surveillance systems by (1) writing software, and (2) catalyzing the formation of a community of users, developers, consultants, companies and  scientists that support open source public health surveillance  software.

     Methods: The University of Pittsburgh seeded the open source project by releasing the RODS software under the GNU General Public License.  We created an open-source-project infrastructure consisting of a website, mailings lists for the developers and users, selected individuals who serve as lead software developers, and shared code-development tools.  These resources are intended to encourage growth of a project community.  We use the following metrics to measure the progress of the project:  usage of the website, number of software downloads, number of inquiries, number of system deployments, and number of new features or modules added to the code base.

     Results: Between September and November 2003,  there have been 5,370 page views of the project website, 59 downloads of the software, 20 inquiries, one new deployment, and we have added four new features.

     Conclusions: Our three-month experience has been that health departments and companies are more interested at present in using the software “as is” than in customization and/or the development  of new features.  Such interest satisfies our primary goal of accelerating the deployment of such systems.  We expect that after the initial effort of installation is completed, these health departments and companies will start to customize the software and contribute these enhancements to the public code base.

 

Wagner MM, Tsui F-C, Espino JU, Hogan WR, Hutman J, Hersh J, Neill D, Moore AW, Parks G, Lewis C, Aller R.  The National Retail Data Monitor for Public Health Surveillance.  Presented at the National Syndromic Surveillance Conference, New York Academy of Medicine, New York, October 20-24, 2003.  CDC’s Morbidity and Mortality Weekly Report 53 (Supplement 1) 40-42, 2004. [PDF] or see http://www.cdc.gov/mmwr/preview/mmwrhtml/su5301a9.htm

 

Summary:  Introduction: The National Retail Data Monitor (NRDM) is a public health surveillance tool that collects and analyzes daily sales data for over-the-counter (OTC) healthcare products.

     Objectives: The objectives of the NRDM are to collect 70% of all sales data for selected OTC healthcare products in near real time and to redistribute them to public health officials in raw or analyzed form.

     Methods: Retailers transmit daily sales data encoded using Universal Product Bar Codes (“UPC codes”) to the NRDM server facility.  The UPC-level data are then aggregated into 18 analytic categories and analyzed by spatial and temporal algorithms to detect anomalous levels of sales that might indicate a disease outbreak.  The data are also redistributed in raw form to those health departments that request the data.

     Results:  At present, the NRDM receives data from more than 15,000 stores and has an additional 4,000 stores under agreement representing an estimated 40% market share nationally (with higher coverage in large metropolitan areas).   The data are received with one-day latency or less from time of sale.  The NRDM has created more than 295 accounts for users in 41 states. Eight jurisdictions receive raw data aggregated to the zip-code level.   Future plans include continued recruitment of retailers, reduction of time latency, deployment in other countries, broadening of scope to include prescription antibiotics and other selected prescription medications, research use of the data, and identifying a permanent source of funding.

     Conclusions:  The NRDM is one of the first examples of a national data utility for public health surveillance--collecting, redistributing, and analyzing daily sales volume data of selected healthcare products.  A single national level data-utility reduces the effort for both the data providers and health departments.

 

Dato VM, Wagner MM, Fapohunda A.  How Outbreaks of Infectious Disease are Detected: A Review of Surveillance Systems and Outbreaks.  Public Health Reports 119 (Sept/Oct): 464-471, 2004  http://www.publichealthreports.org/article/PIIS0033354904001153/abstract (can download fulltext PDF from this site)

 

Summary:  Objective:  To learn how outbreaks are detected and describe the entities and information systems that together function to identify outbreaks in the United States.

Methods: We reviewed multiple sources of information to create a practical description of existing surveillance systems and how they interact to detect outbreaks. The results of this study were summarized in a system diagram.  To understand how the elements interact, we examined a sample of recent outbreaks to determine how they were detected with reference to the system diagram.  

Results: The system diagram for the de facto U.S. outbreak detection system consists of five components: the clinical health care system, local/state health agencies, federal agencies, academic/professional organizations and collaborating governmental jurisdictions. Primary data collection occurs at the clinical health care systems, and local health agencies. The outbreaks studied showed all five components aggregating, analyzing and sharing data.

Conclusions: The current U.S. approach to detection of disease outbreaks is complex and involves many organizations interacting in loosely coupled manner.  State and local health departments and the heath care system are major components in detection of outbreaks.

 

Wong W-K, Moore AW, Cooper GF, Wagner, MM.  WSARE: What's Strange About Recent Events.  Journal of Urban Health: Bulletin of the New York Academy of Medicine 80/2 (Supplement 1) 66-75, 2003.  [PDF]

 

Summary:  This article presents an algorithm for performing early detection of disease outbreaks by searching a database of emergency department cases for anomalous patterns.  Traditional techniques for anomaly detection are unsatisfactory for this problem because they identify individual data points that are rare due to particular combinations of features.  Thus, these traditional algorithms discover isolated outliers of particularly strange events, such as someone accidentally shooting their ear, that are not indicative of a new outbreak.  Instead, we would like to detect groups with specific characteristics that have a recent pattern of illness that is anomalous relative to historical patterns.  We propose using an anomaly detection algorithm that would characterize each anomalous pattern with a rule.  The significance of each rule would be carefully evaluated using the Fisher exact test and a randomization test.  In this study, we compared our algorithm with a standard detection algorithm by measuring the number of false positives and the timeliness of detection.  Simulated data, produced by a simulator that creates the effects of an epidemic on a city, were used for evaluation.  The results indicate that our algorithm has significantly better detection times for common significance thresholds while having a slightly higher false positive rate.

Wagner MM, Robinson JM, Tsui F-C, Espino JU, Hogan WR.  Design of a National Retail Data Monitor for Public Health Surveillance.  Journal of the American Medical Informatics Association 10/5 (Sept/Oct) 409-418, 2003.   http://www.jamia.org/

Summary:  The National Retail Data Monitor receives data daily from 10,000 stores, including pharmacies that sell health care products.  These stores belong to national chains that process sales data centrally and utilize Universal Product Codes and scanners to collect sales information at the cash register.  The high degree of retail sales automation enables the monitor to collect information from thousands of store locations in near to real time for use in public health surveillance.  The monitor provides user interfaces that display summary sales data on timelines and maps.  Algorithms monitor the data automatically on a daily basis to detect unusual patters of sales.  The project provides the resulting data and analyses, free of charge, to health departments nationwide.  Future plans include continued enrollment and support of health departments, developing methods to make the service financially self-supporting, and further refinement of the data collection system to reduce the time latency of data receipt and analysis.

Tsui F-C, Espino JU, Dato VM, Gesteland PH, Hutman J, Wagner MM.  Technical Description of RODS: A Real-time Public Health Surveillance System.   Journal of the American Medical Informatics Association 10/5 (Sept/Oct) 399-408, 2003.  http://www.jamia.org/

(Note: This paper was selected tor feature on MDLinx, Friday, October 17, 2003.  MDLinx is an online report on the Top Hospital Administration Articles, which features free clinical updates to hundreds of thousands of physicians and healthcare professionals throughout the world.)

Summary:  This paper describes the design and implementation of the Real-time Outbreak and Disease Surveillance (RODS) system, a computer-based public health surveillance system for early detection of disease outbreaks.  Hospitals send data from clinical encounters over virtual private networks and leased lines using the Health Level 7 (HL7) message protocol.  The data are sent in real time.  RODS automatically classifies the registration chief complaint from the visit into one of seven syndrome categories using Bayesian classifiers.  It stores the data in a relational database; aggregates the data for analysis using data warehousing techniques; applies univariate and multivariate statistical detection algorithms to the data, and alerts users of when the algorithms identify anomalous patterns in the syndrome counts.  RODS also has a Web-based user interface that supports temporal and spatial analyses.  RODS processes sales of over-the-counter healthcare products in a similar manner, but receives such data in batch mode on a daily basis.

     RODS was used during the Winter 2002 Olympics and currently operates in two states-- Pennsylvania and Utah.  It has been and continues to be a resource for implementing, evaluating, and applying new methods of public health surveillance.

Chapman WW, Cooper GF, Hanbury P, Chapman BE, Harrison LH, Wagner MM.  Creating A Text Classifier to Detect Radiology Reports Describing Mediastinal Findings Associated with Inhalational Anthrax and Other Disorders.  Journal of the American Medical Informatics Association 10/5 (Sept/Oct) 494-503, 2003.  http://www.jamia.org/

Summary:  Objective -- Create a classifier for automatic detection of chest radiograph reports consistent with the mediastinal findings of inhalational anthrax.

   Design -- we used the IPS system to create a keyword classifier for detecting reports describing mediastinal findings consistent with anthrax and compared their performance on a test set of 79,032 chest radiograph reports.

   Measurements - Area under the ROC curve was the main outcome measure of the IPS classifier.  We calculated sensitivity and specificity of an initial IPS model based on an existing keyword search and compared it against a Boolean version of the IPS classifier.

   Results -- The IPS classifier received an area under the ROC curve of 0.677 (90% CI = 0.628-0.772) with a specificity of 0.99 and maximum sensitivity of 0.35.  The initial IPS model attained a specificity of 0.1 and a sensitivity of 0.04.

   Conclusion -- The IPS system is a useful tool for helping domain experts create a statistical keyword classifier for textual reports that is a potentially useful component in surveillance of radiographic findings suspicious for anthrax.

 

Gesteland PH, Gardner RM, Tsui F-C, Espino JU, Rolfs RT, James BC, Wagner MM.  Implementing Automated Syndromic Surveillance for the 2002 Winter Olympics.  Journal of the American Medical Informatics Association 10/6 (Nov/Dec): 547-554, 2003.

http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&list_uids=12925547&dopt=Abstract

 

or http://www.jamia.org/cgi/reprint/M1352v1

 

Summary:  The key to minimizing the effects of an intentionally caused disease outbreak is early detection.  This fact has prompted development of surveillance systems that use real-time data to track syndrome counts within a region.  We successfully implemented such a computer-based, automated syndromic surveillance system in Utah in just six weeks.  Using free-text chief complaints captured in emergency departments and walk-in clinics, we achieved an approximate 70% coverage of acute-care visits in a seven-county region with a population of 1.8 million.  The organizational issues outweighed the technical issues.  The motivation and cooperation inspired by the 2002 Winter Olympics was a powerful driver in overcoming the organizational issues. This paper aims to inform developers of public health surveillance systems of our experience and suggest a framework for how these systems can be deployed.

 

Hogan WR, Tsui F-C, Ivanov I, Gesteland P, Grannis S, Overhage JM, Robinson JM, Wagner MM.  Detection of Pediatric Respiratory and Diarrheal Outbreaks from Sales of Over-the-Counter Electrolyte Products.  Journal of the American Medical Informatics Association 10/6 (Nov/Dec): 555-562, 2003.

http://www.jamia.org/

 

Summary:  We studied retrospectively the correlation between pediatric hospital admissions and retail sales of over-the-counter electrolyte products in six noncontiguous urban regions.

     Using the exponentially weighted moving average (EWMA) control chart method, we measured the earliness, sensitivity, and specificity of OTC electrolyte product sales for predicting epidemics of pediatric illness.  The dates of onset of the epidemics of pediatric illness were determined using EWMA as well.

     We found that OTC electrolyte sales in pediatric patients correlate well with hospital admissions of children for infectious diseases and precede admissions by 2.4 weeks (95% C.I. 0.1-4.8)

     We conclude that OTC electrolyte sales are an early indicator of epidemics of disease in children.

 

Wagner MM, Dato VM, Dowling JN, Allswede M.  Representative Threats for Research in Public Health Surveillance.  Journal of Biomedical Informatics 36: 177-188, 2003.

http://authors.elsevier.com/sd/article/S1532046403000650

 

Summary:  The number of biological agents that could be used by terrorists exceeds 100.  The size of this ‘problem space’ is a challenge for researchers, and the numbers of ways that each agent can be used increases the size of the problem space even further.

     In this paper, we address this problem by creating a parsimonious characterization of the problem space.  We cluster the threats into nine categories based on their similarity as problems in detection.  We also identify one or more threats in each category that have occurred in recent times and could be used by researchers as surrogates for non-occurrent diseases. 

     We suggest that researchers consider our categories as a Criterion Set for analysis and evaluation of detection systems.  The categories characterize the problem space in a tractable manner with less loss of generality than analyses based on one or two selected diseases, which is representative of current analyses.

 

Wagner MM.  Models of Computer-Based Outbreak Detection.  Reference Librarian 79/80:343-362, 2003. 

http://www.haworthpress.com/store/product.asp?sku=J120.

Summary:  Despite intense development of computer-based surveillance (CBS), there has been little work on models of such systems.  This paper describes a computational model of CBS:  Output, inputs, and algorithms required to create the output from the inputs.  Starting from an assumed requirement that an ideal CBS system be capable of mitigating a bioterroristic outbreak of inhalational Anthrax, it is shown that an output of CBS should be a continuously updated probabilistic threat assessment, which is used to trigger response actions at levels of threat determined by cost-benefit analyses.  The paper considers the types of algorithmic transformations required to map from a full spectrum of types of input data to the required output.  The set of input data is much broader than previously considered.  The specific complementary relationship between the computational model and the evolving architectural model of the National Electronic Disease Surveillance System (NEDSS) is discussed.

Mandl KD, Overhage JM, Wagner MM, Lober WB, Sebastiani P, Mostashari F, Pavlin JA, Gesteland PH, Treadwell T, Koski E, Hutwagner L, Buckeridge DL, Aller RD, Grannis S.  Implementing Symdromic Surveillance: A Practical Guide Informed by the Early Experience. Journal of the American Medical Informatics Association, 2003 (In press).

Summary:   Syndromic surveillance refers to methods relying on detection of clinical case features that are discernable before confirmed diagnoses are made.  In particular, prior to the laboratory confirmation of an infectious disease, ill persons may exhibit behavioral patterns, symptoms, signs, or laboratory findings that can be tracked through a variety of data sources.  Syndromic surveillance systems are being developed locally, regionally, and nationally.  The efforts have been largely directed at facilitating the early detection of a covert bioterrorist attacks, but the technology  may also be useful for general public health, clinical medicine, quality improvement, patient safety, and research.  This paper, authored by developers and methodologists involved in the design and deployment of the first wave of syndromic surveillance systems, is intended to serve as a guide for informaticians, public health managers, and practitioners who are currently planning deployment of such systems in their regions.

 

Panackal AA, M’Ikanatha NM, Tsui F-C, McMahon J, Wagner MM, Dixon BW, Zubieta J, Phelan M, Mirza S, Morgan J, Jernigan D, Pasculle AW, Rankin JT Jr, Hajjeh RA, Harrison LH.  Automatic Electronic Laboratory.  Based Reporting of Notifiable Infectious Diseases at a Large Health System.  Emerging Infectious Diseases 8/7: 685-691, 2002.

http://www.cdc.gov/ncidod/EID/vol8no7/01-0493-G3.htm

 

Summary:  Electronic laboratory-based reporting, developed by the UPMC Health System, Pittsburgh, Pennsylvania, was evaluated to determine if it could be integrated into the conventional paper-based reporting system.  We reviewed reports of ten infectious diseases  from eight UPMC hospitals that reported to the Allegheny County Health Department in southwestern Pennsylvania during January 1-November 26, 2000.  Electronic reports were received a median of 4 days earlier than conventional reports.  The completeness of reporting was 74% (95% confidence interval [Cl] 66% to 81%) for the electronic laboratory-based reporting and 65% (95% Cl 57% to 73%) for the conventional paper-based reporting system (p>0.05).  Most reports (88%) missed by electronic laboratory-based reporting were caused by using free text.  Automatic reporting was more rapid and as complete as conventional reporting.  Using standardized coding and minimizing free text usage will increase the completeness of electronic laboratory-based reporting.

 

Teich JM, Wagner MM, Mackenzie CF, Schafer KO. The informatics response in disaster, terrorism, and war.   Journal of the American Medical Informatics Association 9:97-104, 2002.

http://www.jamia.org/

Summary:  The United States currently faces several new, concurrent large-scale health crises as a result of terrorist activity.  In particular, three major health issues have risen sharply in immediacy and public consciousness: bioterrorism, the threat of widespread delivery of agents of illness; mass disasters, local events that produce large numbers of casualties and overwhelm the usual capacity of the healthcare delivery system; and the problem of delivering optimal health care to remote military field sites. Each of these health issues carries large demands for collection, analysis, coordination, and distribution of health information.  In this article, we present overviews and ongoing work efforts in each area.  

Lober WB, Karras BT, Wagner MM, Overhage JM, Davidson AJ, Fraser H, Mandl KD, Espino JU, Tsui F-C.  Roundtable on bioterrorism detection: Information system-based surveillance.  Journal of the American Medical Informatics Association 9:105-115, 2002.  http://www.jamia.org/ 

Summary: During the 2001 AMIA Annual Symposium, the Anesthesia, Critical Care, and Emergency Medicine Working Group hosted a Roundtable on Bioterrorism (BT) Detection.  Sixty-four people attended the Roundtable, during which public health surveillance systems designed to enhance early detection of BT events were discussed by several researchers. These systems make secondary use of existing clinical, laboratory, paramedical and pharmacy data, or facilitate electronic case reporting by clinicians. This paper combines case reports of 6 existing systems with discussion of some common techniques and approaches.  The Roundtable’s goal was to foster communication among researchers and promote progress by (1) sharing information about systems, including origins, current capabilities, stages of deployment, and architectures, (2) sharing lessons learned while developing and implementing systems, and (3) exploring cooperation between projects, including sharing of software and data.  A listserve for this effort can be found at http://bt.cirg.washington.edu

Wagner MM.  The Space Race and Biodefense: Lessons From NASA About Big Science and the Role of Medical Informatics.  Journal of the American Medical Informatics Association 9:120-122, 2002.  http://www.jamia.org/

Summary:  The events that followed the launch of Sputnik on October 4, 1957 provide a metaphor for the events that followed the October 4, 2001 announcement by health officials of a case of pulmonary anthrax in the U.S.  This paper uses that metaphor to elucidate the nature of the task ahead, and to suggest questions such as:  Can the goals of the biodefense effort be formulated as concisely and concretely as the goal of NASA? Can we measure success in biodefense as we did for the space project? and Who are equivalent of rocket engineers for Biodefense?  

Zeng and Wagner. Modeling the Effects of Epidemics on Routinely Collected Data JAMIA, Special Issue on Enabling Patient Safety Through Informatics (Selected Works from the 2001 AMIA Annual Symposium and Educational Curricula for Enhancing Safety Awareness)  9/6 (Nov/Dec): S17-S22, 2002.  [Summary] http://www.jamia.org/

Summary: The use of routinely collected data, such as absenteeism, to provide an early warning of an epidemic will depend on better understanding of the effects of epidemics on such data. We reviewed studies in behavioral medicine and health psychology in order to build a model relating known factors related to human health information and treatment seeking behavior and effects on routinely collected data. This review and modeling effort may be useful to researchers in early detection, simulation, and response policy analysis.

Tsui, F-C, Wagner, M.M., Dato V.M., Chang, C.H. Value of ICD-9-Coded Chief Complaints for Detection of Epidemics. Journal of the American Medical Informatics Association, Special Issue on Enabling Patient Safety Through Informatics (Selected Works from the 2001 AMIA Annual Symposium and Educational Curricula for Enhancing Safety Awareness)  9/6 (Nov/Dec): S41-S47, 2002.

http://www.jamia.org/

Summary: To assess the value of ICD-9-coded chief complaints for early detection of epidemics, the sensitivity, positive predictive value, and timeliness of Influenza detection using a respiratory set (RS) of ICD-9 codes and an Influenza set (IS) were measured. Timeliness using the cross-correlation function was also measured. For Influenza epidemics occurring in Pittsburgh between December 1999 and December 2000, the detection had a sensitivity of 100% (3/3 epidemics) and positive predictive values of 75% (3/4) for IS and 100% (3/3) for RS. The inherent timeliness of the data shown by the cross correlation function was good; however, the timeliness of detection was poor.

Wong W-K, Moore AM, Cooper GF, Wagner MM.  Rule-based Anomaly Pattern Detection for Detecting Disease Outbreaks.  In: Proceedings of The Eighteenth National Conference on Artificial Intelligence (AAAI-02), Fourteenth Innovative Applications of AI Conference (IAAI-02) (held in Edmonton, Alberta, Canada, July 28-August 1, 2002).  American Association of Artificial Intelligence (AAAI) Press/The MIT Press, Menlo Park, CA, pp. 217-223, 2002. http://www.autonlab.org/pap.html#wsare  [PDF]

Summary:  This paper presents an algorithm for performing early detection of disease outbreaks by searching a database of Emergency Department cases for anomalous patterns. Traditional techniques for anomaly detection are unsatisfactory for this particular problem because they identify individual data points that are rare due to their particular combination of features. When applied to our scenario, these traditional algorithms discover isolated outliers of particularly strange events, such as someone accidentally shooting their ear, that are not indicative of a new disease outbreak. Instead, we would like to detect anomalous patterns. These patterns are groups with specific characteristics, such as elderly males living in a particular neighbourhood, whose recent proportions are anomalous based on what their normal proportions should be. We propose using a rule-based anomaly detection algorithm that characterizes each anomalous pattern with a rule. The significance of each rule is carefully evaluated using Fisher’s Exact Test and a randomization test. The performance of our algorithm is compared against the standard algorithm by measuring the number of false positives and the timeliness of detection. Simulated data is used in the evaluation phase. This data was produced by a simulator that simulates the effects of a disease outbreak on a city. The results indicate that our algorithm has significantly better detection times for common significance thresholds while having a slightly higher false positive rate.

Wagner MM, Tsui F-C, Espino JU, Dato VM, Sittig DF, Caruana RA, McGinnis LF, Deerfield DW, Druzdzel MJ, Fridsma DB. The emerging science of very early detection of disease outbreaks. Journal of Public Health Management and Practice 7(6): 51-59, 2001.

http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&list_uids=11710168&dopt=Abstract

Summary: The surge of development of new public health surveillance systems designed to provide more timely detection of outbreaks suggests that public health has a new requirement: extreme timeliness of detection. The authors review previous work relevant to measuring timeliness and to defining timeliness requirements. Using signal detection theory and decision theory, the authors identify strategies to improve timeliness of detection and position ongoing system development within that framework.

Yasnoff WA, Overhage JM, Humphreys BL, LaVenture M, Goodman KW, Gatewood L, Ross DA, Reid J, Hammond WE, Dwyer D, Huff SM, Gotham I, Kukafka R, Loonsk JW, Wagner MM.  A national agenda for public health informatics. Journal of  Public Health Management and  Practice 2001 Nov;7(6):1-21. http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&list_uids=11713752&dopt=Abstract

 

The American Medical Informatics Association 2001 Spring Congress brought together the public health and informatics communities to develop a national agenda for public health informatics. Discussions on funding and governance; architecture and infrastructure; standards and vocabulary; research, evaluation, and best practices; privacy, confidentiality, and security; and training and workforce resulted in 74 recommendations with two key themes: (1) all stakeholders need to be engaged in coordinated activities related to public health information architecture, standards, confidentiality, best practices, and research and (2) informatics training is needed throughout the public health workforce. Implementation of this consensus agenda will help promote progress in the application of information technology to improve public health.

 

 

Book Chapters

 

 Wagner MM, Espino JU, Tsui F-C, Aryel RM.  Public Health Surveillance: The Role of Clinical Information Systems.  In: Ball MJ, Weaver CA, Kiel JM (eds.) Healthcare Information Management Systems, Cases, Strategies, and Solution, Chapter 39.  Health Informatics Series, 3rd Ed.  Springer-Verlag, pp. 513-531, 2004. [PDF]

 

Summary:  Recently, health departments have begun collecting data from hospitals in near real time for public health surveillance.  In New York City, Boston, and Washington, for example, hospitals send daily reports of visits to the health department.

      This new trend is driven by the need for early warning of surreptitious biological attacks.  The trend is likely to accelerate because evidence is accumulating that these new approaches work—that they can detect outbreaks earlier than existing methods and even identify outbreaks that have heretofore gone unnoticed.

     This trend has important implications for researchers and developers in clinical informatics:  It is creating new design requirements for clinical information systems. 

     This chapter is written for both the public health reader as well as clinical informatics reader.  For clinical informaticians, it provides a primer on public health surveillance, drawing on examples from our experience with the RODS system, which is a regional public health surveillance system.  For public health workers, it provides a primer on clinical information systems, also drawing on examples from our experience with the Health System Resident Component (HSRC), a hospital-based component similar to a clinical event monitor. 
 

Refereed Conference Proceedings

 

Olszewski RT.  Bayesian Classification of Triage Diagnoses for the Early Detection of Epidemics.  In: Recent Advances in Artificial Intelligence: Proceedings of the Sixteenth International Florida Artificial Intelligence Research Society (FLAIRS) Conference (held in St. Augustine, Florida, May 12-14, 2003), AAAI Press, Menlo Park, California, pp. 412-416, 2003. [PDF]

 

Summary:  The distribution of illnesses reported by emergency departments from hospitals in a region under surveillance is particularly informative for the early detection of epidemics. The most direct source of data for construction of such a distribution is the final diagnoses of patients being seen in the emergency departments, but the delay in their availability impinges on the requirement that detection be timely. Free-text descriptions of patients’ symptoms, called triage diagnoses, and ICD-9 values that encode the symptoms are entered when patients are admitted and, consequently, are timelier sources of data. An experiment to evaluate the accuracy of Bayesian classification of triage diagnoses into syndromes (i.e., illness categories) was performed, resulting in areas under the ROC curve (AUC) between .80 and .97 for the various syndromes. The classification accuracies using triage diagnoses surpass the classification accuracies using ICD-9 codes reported by previous studies. Triage diagnoses, therefore, are a more accurate source of data than ICD-9 codes for the early detection of epidemics. [http://www.flairs.com/flairs2003/index.html]

Espino JU, Hogan WR, Wagner MM.  Telephone Triage: A Timely Data Source for Surveillance of Influenza-like Diseases.  Proceedings of the Annual Fall Symposium of the American Medical Informatics Association, Omni Press CD, pp. 215-219, 2003.  [PDF]

Summary:  We evaluated telephone triage (TT) data for public health early warning systems.  TT data is electronically available and contains coded elements that include the demographics and description of a caller’s medical complaints.   In the study, we obtained emergency room TT data and after hours TT data from a commercial TT software and service company.  We compared the timeliness of the TT data with influenza surveillance data from the Centers for Disease Control using the cross correlation function.  Emergency room TT calls are one to five weeks ahead of surveillance data collected by the CDC. 

Ivanov O, Gesteland PH, Hogan WR, Mundorff MB, Wagner MM.  Detection of Pediatric Respiratory and Gastrointestinal Outbreaks from Free-Text Chief Complaints.   Proceedings of the Annual Fall Symposium of the American Medical Informatics Association, Omni Press CD, pp. 318-322, 2003. [PDF]

Summary:  We conducted a retrospective study to ascertain the potential of free-text chief complaints collected in pediatric emergency departments to serve as surveillance data for early detection of outbreaks.

     We determined that automatically coded chief complaint data provide a signal that reflects outbreaks in a population of children less than five years of age.  Using the Exponentially Weighted Moving Average (EWMA) detection algorithm, we measured the timeliness, sensitivity, and specificity of free-text chief complaints for predicting outbreaks of pediatric respiratory and gastrointestinal illness.

   We found that time series of automatically coded free text-chief complaints in pediatric patients correlate well with hospital admissions and preceded them by the mean of 10.3 days (95% CI -15.15, 35.5) for respiratory outbreaks and 29 days (95%, CI 4.23, 53.7) for gastrointestinal outbreaks.

     We conclude that free-text chief complaints may play an important role as an early, sensitive, and specific indicator of outbreaks of respiratory and gastrointestinal illness in children less than five years of age.

Zhang J, Tsui F-C, Wagner MM, Hogan WR.  Detection of Outbreaks from Time Series Data Using Wavelet Transform.  Proceedings of the Annual Fall Symposium of the American Medical Informatics Association, Omni Press CD, pp. 748-752, 2003. [PDF] 

Summary:  In this paper, we developed a new approach to detection of disease outbreaks based on wavelet transform. It is capable of dealing with two problems found in real-world time series data, namely, negative singularity and long-term trends, which may degrade the performance of current approaches to outbreak detection. To test this approach, we introduced simulated disease outbreaks and negative singularities into a real world dataset and applied it and two other algorithms—autoregressive (AR) and Multi-resolution Wavelet Auto-regressive (MWAR) — to this dataset. We compared the performance of these algorithms in terms of sensitivity, specificity and timeliness. The results showed that our approach had similar sensitivity and specificity and slightly better timeliness compared to the other two algorithms initially, but when we introduced negative singularities, its performance did not degrade as significantly as the other two algorithms' performance. We conclude that our approach to detection, when compared to traditional approaches, is not as susceptible to degradation of performance caused by negative singularities.

Ma L, Tsui F-C, Hogan WR, Wagner MM, Ma H.  A Framework for Infection Control Surveillance Using Association Rules.  Proceedings of the Annual Fall Symposium of the American Medical Informatics Association, Omni Press CD, pp. 410-414, 2003. [PDF]

Summary:  We developed a Bayesian network model for detecting pulmonary tuberculosis in hospitalized patients. The network was constructed using objective probabilities from the literature and subjective expert knowledge. The network comprised 19 variables, whose values we extracted either automatically or manually from information in electronic medical records.

   We retrospectively evaluated the ability of the network to detect patients with undiagnosed pulmonary tuberculosis at the time of admission.  We assigned variables to one of four categories depending on level of difficulty of automatic extraction from our clinical information systems: coded, coded with some free text, free text in radiological reports and free text in admission reports.  We also determined whether the information in the clinical information systems was available within three days of admission or later. We measured the networks performance with increasing amounts of data ranging from just coded data available within three days of admission to all available information at any time during the admission.

The best performance (AUC=0.954, 95% CI 0.905, 0.999) was achieved using all available data.  The poorest performance (AUC=0.676; 95% CI 0.535, 0.817) was for Types I, and II (coded + microbiological lab data) available within the first three days of hospital stay. We found that adding radiological data to coded and micro lab data was significant (p = 0.006) in first three days of hospital stay and in entire hospital stay (p = 0.031) for areas under the curves. We also found that using only data from radiology reports available during the first three days of hospitalization, AUC is 0.794 (95% CI: 0.688, 0.901), and using data from any radiology report AUC is 0.906 (95% CI: 0.839, 0.973). 

     We conclude that the Bayesian network is a promising method for tuberculosis cases detection, that radiographic findings are important to detection performance, and that either coding of radiographic findings or natural language processing is necessary for timely and accurate detection.

Gesteland PH, Wagner MM, Chapman WW, Espino JU, Tsui F-T, Gardner RM, Rolfs RT, Dato VM, James BC, Haug PJ.  Rapid Deployment of an Electronic Disease Surveillance System in the State of Utah for the 2002 Olympic Winter Games.  Journal of the American Medical Informatics Association, Supplement issue on the Proceedings of the Annual Fall Symposium of the American Medical Informatics Association Biomedical Informatics: One Discipline (November 9-13, 2002, San Antonio, Texas), Kohane IS (ed), Hanley & Belfus, Inc., Philadelphia, Pennsylvania, pp. 285-289, 2002. [PDF]  

Summary:   The key to minimizing the effects of an intentionally caused disease outbreak is early detection of the attack and rapid identification of the affected individuals.  The Bush administration’s leadership in advocating for biosurveillance systems capable of monitoring for bioterrorism attacks suggests that we should move quickly to establish a nationwide early warning biosurveillance system as a defense against this threat.  The spirit of collaboration and unity inspired by the events of 9-11 and the 2002 Olympic Winter Games in Salt Lake City provided the opportunity to demonstrate how a prototypic biosurveillance system could be rapidly deployed.  In seven weeks we were able to implement an automated, real-time disease outbreak detection system in the State of Utah and monitored 80,684 acute care visits occurring during a 28-day period spanning the Olympics.  No trends of immediate public health concern were identified.

Ivanov O, Wagner MM, Chapman WW, Olszewski RT.  Accuracy of Three Classifiers of Acute Gastrointestinal Syndrome for Syndromic Surveillance.  Journal of the American Medical Informatics Association, Supplement issue on the Proceedings of the Annual Fall Symposium of the American Medical Informatics Association Bio+medical Informatics: One Discipline (November 9-13, 2002, San Antonio, Texas), Kohane IS (ed), Hanley & Belfus, Inc., Philadelphia, Pennsylvania, pp. 345-349, 2002. [PDF]

Summary:  ICD-9–coded emergency department (ED) diagnoses and free-text triage diagnoses are routinely collected data elements that have potential value for public health surveillance and early detection of epidemics.

We constructed and evaluated three classifiers for the detection of cases of acute gastrointestinal syndrome of public health significance: one used ICD-9–coded ED diagnosis as input data; the other two used free-text triage diagnosis. We measured the performance of these classifiers against the expert classification of cases based on review of ED reports.  The sensitivity of the ICD-9–code classifier was 0.32, and the specificity was 0.99. The sensitivity of a naïve Bayes classifier using triage diagnoses was 0.63, the specificity was 0.94, and the area under the ROC curve was 0.82.  A bigram Bayes classifier had sensitivity 0.38, specificity 0.94, and area under the ROC of 0.69.

We conclude that a naive Bayes classifier of free-text triage diagnosis data provides more sensitive and earlier detection of cases of acute gastrointestinal syndrome than either a bigram Bayes classifier or an ICD-9 code classifier. The sensitivity achieved should be sufficient for syndromic surveillance system designed to detect moderate to large epidemics.

 

Tsui F-C, Espino JU, Wagner MM, Gesteland PH, Ivanov O, Olszweski RT, Liu Z, Zeng X, Chapman WW, Wong W-K, Moore AW.  Data, Network, and Application: Technical Description of the Utah RODS Winter Olympic Biosurveillance System.  Journal of the American Medical Informatics Association, Supplement issue on the Proceedings of the Annual Fall Symposium of the American Medical Informatics Association Bio+medical Informatics: One Discipline” (November 9-13, 2002, San Antonio, Texas), Kohane IS (ed), Hanley & Belfus, Inc., Philadelphia, Pennsylvania, pp. 815-819, 2002.  [PDF]

Summary:  Given the post September 11th climate of possible bioterrorist attacks and the high profile 2002 Winter Olympics in the Salt Lake City, Utah, we challenged ourselves to deploy a computer-based real-time automated biosurveillance system for Utah, the Utah Real-time Outbreak and Disease Surveillance system (Utah RODS), in six weeks using our existing Real-time Outbreak and Disease Surveillance (RODS) architecture. During the Olympics, Utah RODS received real-time HL-7 admission messages from 10 emergency departments and 20 walk-in clinics.  It collected free-text chief complaints, categorized them into one of seven prodromes classes using natural language processing, and provided a web interface for real-time display of time series graphs, geographic information system output,  outbreak algorithm alerts, and details of the cases. The system detected two possible outbreaks that were dismissed as the natural result of increasing rates of Influenza. Utah RODS allowed us to further understand the complexities underlying the rapid deployment of a RODS-like system.

Tsui, F-C, Wagner, M.M., Dato V.M., Chang, C.H. Value of ICD-9-Coded Chief Complaints for Detection of Epidemics. Journal of the American Medical Informatics Association, Supplement issue on the Proceedings of the Annual Fall Symposium of the American Medical Informatics Association (November 4-7, 2001, Washington, D.C.), Bakken S (ed), Hanley & Belfus, Inc., Philadelphia, Pennsylvania, pp. 711-715, 2001. [PDF] 

(Full paper published in the Journal of the American Medical Informatics Association Special Issue on Enabling Patient Safety Through Informatics: Selected Works from the 2001 AMIA Annual Symposium and Educational Curricula for Enhancing Safety Awareness  9/6 (Nov/Dec), S41-S47, 2002. See above.)

Summary: To assess the value of ICD-9-coded chief complaints for early detection of epidemics, the sensitivity, positive predictive value, and timeliness of Influenza detection using a respiratory set (RS) of ICD-9 codes and an Influenza set (IS) were measured. Timeliness using the cross-correlation function was also measured. For Influenza epidemics occurring in Pittsburgh between December 1999 and December 2000, the detection had a sensitivity of 100% (3/3 epidemics) and positive predictive values of 75% (3/4) for IS and 100% (3/3) for RS. The inherent timeliness of the data shown by the cross correlation function was good; however, the timeliness of detection was poor.

Zeng X, Wagner, M.M. Modeling the Effects of Epidemics on Routinely Collected Data. Journal of the American Medical Informatics Association, Supplement issue on the Proceedings of the Annual Fall Symposium of the American Medical Informatics Association (November 4-7, 2001, Washington, D.C.), Bakken S (ed), Hanley & Belfus, Inc., Philadelphia, Pennsylvania, pp. 781-785, 2001. [PDF] 

Summary: The use of routinely collected data, such as absenteeism, to provide an early warning of an epidemic will depend on better understanding of the effects of epidemics on such data. We reviewed studies in behavioral medicine and health psychology in order to build a model relating known factors related to human health information and treatment seeking behavior and effects on routinely collected data. This review and modeling effort may be useful to researchers in early detection, simulation, and response policy analysis.

Espino, J.U., Wagner, M.M. Accuracy of ICD-9-coded chief complaints and diagnoses for the detection of acute respiratory illness. Journal of the American Medical Informatics Association, Supplement issue on the Proceedings of the Annual Fall Symposium of the American Medical Informatics Association (November 4-7, 2001, Washington, D.C.), Bakken S (ed), Hanley & Belfus, Inc., Philadelphia, Pennsylvania, pp. 164-168, 2001. [PDF]

Summary: ICD-9-coded chief complaints and diagnoses are a routinely collected source of data with potential for use in public health surveillance. We constructed two detectors of acute respiratory illness: one based on ICD-9-coded chief complaints and one based on ICD-9-coded diagnoses. We measured the classification performance of these detectors against the human classification of cases based on review of emergency department reports. Using ICD-9-coded chief complaints, the sensitivity of detection of acute respiratory illness was 0.44 and its specificity was 0.97. The sensitivity and specificity using ICD-9-coded diagnoses were no different. These properties of excellent specificity and moderate sensitivity, coupled with the earliness and electronic availability of such data, suggest that detectors based on ICD-9 coding of emergency department chief complaints have a role in public health surveillance.

Chapman, W.W., Bridewell, W., Hanbury, P., Cooper, G.F., and Buchanan, B.G. Evaluation of Negation Phrases in Narrative Clinical Reports. Journal of the American Medical Informatics Association, Supplement issue on the Proceedings of the Annual Fall Symposium of the American Medical Informatics Association (November 4-7, 2001, Washington, D.C.), Bakken S (ed), Hanley & Belfus, Inc., Philadelphia, Pennsylvania, pp. 105-109, 2001.  (Nominated for Best Paper Award.) [PDF]

Summary: Objective was to evaluate the use of negation phrases and the frequency of negation in free-text clinical reports. A simple negation algorithm was applied to ten types of clinical reports (n=42,160) dictated during July 2000. How often each of 66 negation phrases was used to mark a clinical observation as absent were counted. A random sample of 400 sentences were read by physicians, and precision was calculated for the negation phrases. What proportion of clinical observations were marked as absent were measured. Sixty negation phrases were triggered by the negation algorithm, but just seven of them accounted for 90% of the negations. The negation phrases received an overall precision of 97%, with "not" earning the lowest precision of 63%. Between 39% and 83% of all clinical observations were identified as absent by the negation algorithm, depending on the type of report analyzed. The most frequently used clinical observations were negated the majority of the time.

Wagner MM, Tsui F-C, Pike J, Pike L.  Design of a clinical notification system.  In: Lorenzi NM (ed), Proceedings of the 1999 Annual Fall Symposium of the American Medical Informatics Association, “Transforming Health Care Through Informatics: Cornerstones for a New Information Management Paradigm” (Washington, D.C., November 6-10, 1999), Hanley & Belfus, Inc., Philadelphia, Pennsylvania, pp. 989-993, 1999. [PDF]

Summary:  We describe the requirements and design of an enterprise-wide notification system. From  published descriptions of notification schemes, our own experience, and use cases provided by diverse users in our institution, we developed a set of functional requirements. The resulting design supports multiple communication channels, third party mappings (algorithms) from message to recipient and/or channel of delivery, and escalation algorithms. A requirement for multiple message formats is addressed by a document specification. We implemented this system in Java as a CORBA object.  This paper describes the design and current implementation of our notification system.

 

Published Testimony

Wagner MM.  A Review of Federal Bioterrorism Preparedness Programs: Building an Early Warning Public Health Surveillance System.  Testimony given to the Hearing of the Oversight and Investigations Subcommittee of the House Committee on Energy and Commerce, Washington, D.C., November 2, 2001.  See http://energycommerce.house.gov/107/hearings/11012001Hearing

406/Wagner684.htm

 

Summary:  A problem that current research is addressing is early detection of an outbreak caused by large-scale aerosol release of Anthrax, where detection must occur within days of release to allow time for response and treatment to occur.  A product of the research is RODS (Real-time Outbreak and Disease Surveillance), an early warning system that has been deployed in Western Pennsylvania for two years.  A key feature of RODS is that it receives data directly and without delay from computers in emergency departments and hospitals.  Early detection is achieved in RODS by identifying patients early in the disease process when they have nonspecific symptoms such as cough or diarrhea, and then using brute-force computer power to find any interesting patterns among the sick individuals that would suggest that an unusual outbreak is occurring.

   Recommendations to the Subcommittee include: (1) Congress should provide funding directly to all ongoing metropolitan and regional efforts to build early warning capability, provided they adhere to National Electronic Disease Surveillance System standards; and (2) Congress should provide funding for basic and applied research in early warning systems for biological threats.  The funding should be directed to interdisciplinary Centers of Excellence.  A focus of the research must be identifying better ways to obtain needed data.

   Federal funding for this research comes from the National Library of Medicine, the Agency for Health Research and Quality, the Centers for Disease Control and Prevention, and the Defense Advanced Research Projects Agency.