The data quality analyzer: A quality control program for seismic data
Introduction
The Albuquerque Seismological Laboratory (ASL) operates nearly 200 seismic stations as part of the Global Seismographic Network (GSN) and the Advanced National Seismic System (ANSS). The data produced from these stations are fundamental to research studies of earthquake sources and earth structure and underpin the operations of the National Earthquake Information Center (NEIC) to provide accurate and timely earthquake data to produce products such as alerts, Web pages, ShakeMaps, and Prompt Assessment of Global Earthquakes for Response (PAGER) impact estimates (Earle et al., 2009). In order to insure the usability of the data, the ASL staff members perform data quality analysis. Traditionally, this has been conducted by waveform review, both through a daily and weekly “run” through the stations, supplemented by automated notifications about problems with availability, timing quality and other data integrity issues, evaluation of power-spectral density, and use of tidal synthetics to catch large-scale problems in polarity and gain. These techniques generally work well for verifying state of health of a station but are not well suited to capturing subtle problems or issues that develop gradually over time, such as the case of degradation of STS-1 responses resulting from humidity in the feedback electronics boxes (Hutt and Ringler, 2011). As a result, the ASL recently has developed and implemented a number of tools to monitor station performance in situ, such as using PQLX (PASSCAL Quick Look eXtended; McNamara and Buland, 2004) and synthetic seismograms to identify changes in gain at GSN stations (Ringler et al., 2010, Ringler et al., 2012a) as well as implementing an annual calibration process (Ringler et al., 2012b).
In order to facilitate the use of multiple metrics to identify problems and to enable the quantification of data quality, we developed a framework, called the Data Quality Analyzer (DQA) to compute data metrics routinely and display the results in an easy-to-use interface. The DQA consists of components for scanning miniSEED (Ahern et al., 2009) data and computing the metrics (SEEDscan), storing them in a database, and displaying the results on a Web interface. The system is configurable to deal with future developments or changes, and we are able to add and modify metrics through an Extensible Markup Language (XML) configuration file. The code may be run as a scheduled task (e.g., nightly) or on command to ensure the latest metrics are available. The DQA makes extensive use of hash signatures to ensure that changes in either metadata or data trigger a rescan to update the metrics.
In this paper we discuss the overall DQA structure including the flow of SEEDscan, the database, and the Web interface as well as describe the currently implemented metrics. Using these metrics, we illustrate a number of common data problems, including some subtle problems not obvious from simple inspection of time series or power spectra. Finally, we discuss future development plans.
Section snippets
The code
The DQA naturally breaks into three distinct pieces: the SEEDscan metric calculator, the database, and the interface. In addition, there is auxiliary code that supports the DQA process.
Metrics
A number of metrics have been developed or adopted by the ASL and are currently in production for monitoring data quality (Table 1); others are still under test. Below, we describe the currently implemented metrics, organized by increasing complexity, using examples from the stations operated by the ASL in the GSN (networks codes CU, IC, and IU), the ANSS backbone (network code US), and two regional networks (network codes IW and NE).
DQA examples
In most cases, the metrics in the DQA are indirect measures of data quality. For example, stability of sensor gain is an important data quality attribute. However, we are not able to make direct measurements of the gain remotely without running calibrations and must rely on well-formulated metrics to identify changes on time scales shorter than the annual calibration schedule. Similarly, one of the limitations of traditional waveform review is that subtle changes in noise levels or response are
Data quality assessment
One of the motivations for the development of the DQA is the desire to quantify data quality. Data quality is notoriously difficult to define and depends largely on the problem to be solved or the user's intended application of the data. For the GSN, data quality assessments (e.g., (http://www.iris.edu/hq/programs/gsn/quality)) are typically conducted after large earthquakes and are very qualitative in nature. One of the few quantitative measures used is availability, which is a performance
Discussion and future work
We have completed the initial phase of the DQA development, with 11 metrics implemented out of 18 planned (Table 1). This first set of metrics provides a proof of concept of the DQA and demonstrates its applicability to data quality analysis on regional, national, and global networks.
Priorities to further develop the DQA involve the completion of remaining metrics, the improvement of the existing interface by improving the plotting features, the expansion of metrics to higher frequencies, and
Conclusions
To complement the efforts to develop a clear set of data quality goals and to distribute information on instrument quality in XML, the DQA is designed to enhance the ability of the ASL to identify and communicate data quality issues. The DQA supplements traditional waveform review and provides new capabilities to characterize data quality using multiple different data quality metrics together. The DQA is designed to be flexible for adding new metrics and is portable for use by network operators
Acknowledgments
We thank Benjamin Marshall, Leo Sandoval, and Tyler Storm for feedback on initial versions of the DQA interface as well as for making suggestions on the metrics. We thank Daniel McNamara for useful discussions on developing noise baselines. Finally, we thank Kent Anderson, Pete Davis, and Mary Templeton for useful discussions regarding various metrics and how best to implement them. We thank Robert Casey, Charles Hutt, Mouse Reusch, and Mary Templeton for helpful reviews of the manuscript. Any
References (21)
- Ahern, T., Casey, R., Barnes, D., Benson, R., Knight, T., Trabant, C., 2009. SEED Reference Manual, version 2.4,...
- Apache Software Foundation, 2011. Apache Commons Forest Hills, MD 〈http://commons.apache.org/〉 (accessed...
- et al.
Ambient earth noise: a survey of the global seismographic network
J. Geophys. Res.
(2004) - et al.
The TauP Toolkit: Flexible seismic travel-time and ray-path utilities
Seismol. Res. Lett.
(1999) - Crotwell, H.P., 2002. SeedCodec, 〈http://www.seis.sc.edu/downloads/seedCodec/〉, (accessed...
- Earle, P.S., Wald, D.J., Jaiswal, K.S., Allen, T.I., Hearne, M.G., Marano, K.D., Hotovec, A.J., Fee, J.M., 2009. Prompt...
- et al.
Observations of time-dependent errors in long-period instrument gain at global seismic stations
Seismol. Res. Lett.
(2006) - et al.
Some possible causes of and corrections for STS-1 response changes in the Global Seismographic Network
Seismol. Res. Lett.
(2011) Java for seismologists
(2000)- Lomax, A., 2014. Software for observation, analysis, and understanding of seismological information, ALomax Scientific,...
Cited by (0)
- 1
Now at Instrumental Software Technologies, Inc., P.O. Box 963, New Paltz, NY 12561, USA.