|
|
|||||||||
|
Persons using assistive technology might not be able to fully access information in this file. For assistance, please send e-mail to: [email protected]. Type 508 Accommodation and the title of the report in the subject line of e-mail. Performance-Critical Anomaly Detection --- United States, December 2002- -March 2004Colin R. Goodall,1 A. Lent,1 S.
Halasz,1 E. Koski,2 D. Agarwal,1 S. Tse,1 G. Jacobson1
Corresponding author: Colin R. Goodall, AT&T Labs, 200 S. Laurel Ave. D4 3D28, Middletown, NJ 07760. Telephone: 732-4205816; Fax: 732-368-7201; E-mail: [email protected]. Disclosure of relationship: The contributors of this report have disclosed that they are employees of AT&T Labs or Quest Diagnostics, Inc., and that their employment compensation may include ownership of company stock. This report does not contain any discussion of unlabeled use of commercial products or products for investigational use. AbstractIntroduction: Performance-critical anomaly detection for biomedical surveillance requires 1) reliable data that are both geotemporally and demographically representative; 2) efficient, real-time, large-scale information-processing capabilities; 3) comprehensive, tunable anomaly-detection algorithms; 4) a flexible platform for investigation and management of anomalies; and 5) alert distribution and management. Objectives: This study analyzed a reliable, high-performance, end-to-end, modular process for early event detection that included data loading and transformation, statistical anomaly detection, and tools for user interaction. Methods: The process architecture and implementation included three components: 1) a data layer, including modules for data loading, cleaning, normalization, coding, and aggregation; 2) an anomaly-detection layer, including multiple methods for statistical anomaly detection and an anomaly case manager; and 3) a presentation layer, including dynamic visualization of data (geographically, temporally, and logically) used in case investigation, publication, and process monitoring. Specific statistical anomaly detection methods used included process-control techniques; SaTScan (a free software program used to calculate spatial, temporal, and space-time scan statistics); a square-root technique; and a new adaptation of Bayesian shrinkage estimation (Kalman Filter Gamma Poisson Shrinker [KF GPS]) used to monitor a stream of events organized into a periodic (daily) array of cross-classified counts with geographic and medical dimensions. Shrinkage estimates were obtained of ratios of observed counts to proportionally fit expected counts that update smoothly with time after allowing for changes in marginal totals. KF GPS was used to model spatial associations and dependencies among the medical measurements. The case manager was used to organize groups of related anomalies into cases and to support collaboration, by providing a set of functions and software linkages for persons with subject-matter, statistical, and analytic expertise to use to investigate and manage anomalies. Each case could be resolved as an alert, deferred, or dismissed. The case manager included a logic-rich engine and two feature-rich, configurable tools for case organization and dynamic data visualization. Similar technology used by AT&T for telecommunications monitoring and case management in an environment in which >300 million calls are received daily was adapted to health-care data, including laboratory test and emergency room data, with comparable performance. Results: In collaboration with Quest Diagnostics, Inc. (QDI), AT&T used a subset of QDI's nationwide testing data for December 2002--March 2004 for three syndromic groupings (respiratory, gastrointestinal, and heavy metals [lead]) in the New York City (NYC) metropolitan area and nationwide (lead only). The system computed approximately 600,000 scores, resulting in approximately 400 anomalies and their cases. Certain anomalies included a spike in overall respiratory test requisitions in the area of Bensonhurst, Queens, NYC; a spike in mycobacteria requisitions in Orange County, New York; and a change in data coding affecting viral tests in Bergen County, New Jersey. Conclusion: This analysis demonstrated 1) the importance of end-to-end process architecture; 2) the utility of multiple algorithms, especially KF GPS, for anomaly detection; and 3) the effectiveness of using a case manager to investigate anomalies and reduce the burden of false positives. The system can handle massive data streams and allows rapid anomaly detection through use of a suite of analytic, data management, and visualization tools.
Disclaimer All MMWR HTML versions of articles are electronic conversions from ASCII text into HTML. This conversion may have resulted in character translation or format errors in the HTML version. Users should not rely on this HTML document, but are referred to the electronic PDF version and/or the original MMWR paper copy for the official text, figures, and tables. An original paper copy of this issue can be obtained from the Superintendent of Documents, U.S. Government Printing Office (GPO), Washington, DC 20402-9371; telephone: (202) 512-1800. Contact GPO for current prices. **Questions or messages regarding errors in formatting should be addressed to [email protected].Date last reviewed: 8/5/2005 |
|||||||||
|