Elsevier

Computers & Geosciences

Volume 76, March 2015, Pages 96-111
Computers & Geosciences

The data quality analyzer: A quality control program for seismic data

https://doi.org/10.1016/j.cageo.2014.12.006Get rights and content

Highlights

  • New standalone software for automated data quality analysis.

  • Data quality metrics are computed and stored in a database easily viewable through a webpage.

  • Users can interactively assign weights on the website to view the quality of data from different aspects.

  • Code can has daily metrics as well as event based metrics.

Abstract

The U.S. Geological Survey's Albuquerque Seismological Laboratory (ASL) has several initiatives underway to enhance and track the quality of data produced from ASL seismic stations and to improve communication about data problems to the user community. The Data Quality Analyzer (DQA) is one such development and is designed to characterize seismic station data quality in a quantitative and automated manner.

The DQA consists of a metric calculator, a PostgreSQL database, and a Web interface: The metric calculator, SEEDscan, is a Java application that reads and processes miniSEED data and generates metrics based on a configuration file. SEEDscan compares hashes of metadata and data to detect changes in either and performs subsequent recalculations as needed. This ensures that the metric values are up to date and accurate. SEEDscan can be run as a scheduled task or on demand. The PostgreSQL database acts as a central hub where metric values and limited station descriptions are stored at the channel level with one-day granularity. The Web interface dynamically loads station data from the database and allows the user to make requests for time periods of interest, review specific networks and stations, plot metrics as a function of time, and adjust the contribution of various metrics to the overall quality grade of the station.

The quantification of data quality is based on the evaluation of various metrics (e.g., timing quality, daily noise levels relative to long-term noise models, and comparisons between broadband data and event synthetics). Users may select which metrics contribute to the assessment and those metrics are aggregated into a “grade” for each station. The DQA is being actively used for station diagnostics and evaluation based on the completed metrics (availability, gap count, timing quality, deviation from a global noise model, deviation from a station noise model, coherence between co-located sensors, and comparison between broadband data and synthetics for earthquakes) on stations in the Global Seismographic Network and Advanced National Seismic System.

Introduction

The Albuquerque Seismological Laboratory (ASL) operates nearly 200 seismic stations as part of the Global Seismographic Network (GSN) and the Advanced National Seismic System (ANSS). The data produced from these stations are fundamental to research studies of earthquake sources and earth structure and underpin the operations of the National Earthquake Information Center (NEIC) to provide accurate and timely earthquake data to produce products such as alerts, Web pages, ShakeMaps, and Prompt Assessment of Global Earthquakes for Response (PAGER) impact estimates (Earle et al., 2009). In order to insure the usability of the data, the ASL staff members perform data quality analysis. Traditionally, this has been conducted by waveform review, both through a daily and weekly “run” through the stations, supplemented by automated notifications about problems with availability, timing quality and other data integrity issues, evaluation of power-spectral density, and use of tidal synthetics to catch large-scale problems in polarity and gain. These techniques generally work well for verifying state of health of a station but are not well suited to capturing subtle problems or issues that develop gradually over time, such as the case of degradation of STS-1 responses resulting from humidity in the feedback electronics boxes (Hutt and Ringler, 2011). As a result, the ASL recently has developed and implemented a number of tools to monitor station performance in situ, such as using PQLX (PASSCAL Quick Look eXtended; McNamara and Buland, 2004) and synthetic seismograms to identify changes in gain at GSN stations (Ringler et al., 2010, Ringler et al., 2012a) as well as implementing an annual calibration process (Ringler et al., 2012b).

In order to facilitate the use of multiple metrics to identify problems and to enable the quantification of data quality, we developed a framework, called the Data Quality Analyzer (DQA) to compute data metrics routinely and display the results in an easy-to-use interface. The DQA consists of components for scanning miniSEED (Ahern et al., 2009) data and computing the metrics (SEEDscan), storing them in a database, and displaying the results on a Web interface. The system is configurable to deal with future developments or changes, and we are able to add and modify metrics through an Extensible Markup Language (XML) configuration file. The code may be run as a scheduled task (e.g., nightly) or on command to ensure the latest metrics are available. The DQA makes extensive use of hash signatures to ensure that changes in either metadata or data trigger a rescan to update the metrics.

In this paper we discuss the overall DQA structure including the flow of SEEDscan, the database, and the Web interface as well as describe the currently implemented metrics. Using these metrics, we illustrate a number of common data problems, including some subtle problems not obvious from simple inspection of time series or power spectra. Finally, we discuss future development plans.

Section snippets

The code

The DQA naturally breaks into three distinct pieces: the SEEDscan metric calculator, the database, and the interface. In addition, there is auxiliary code that supports the DQA process.

Metrics

A number of metrics have been developed or adopted by the ASL and are currently in production for monitoring data quality (Table 1); others are still under test. Below, we describe the currently implemented metrics, organized by increasing complexity, using examples from the stations operated by the ASL in the GSN (networks codes CU, IC, and IU), the ANSS backbone (network code US), and two regional networks (network codes IW and NE).

DQA examples

In most cases, the metrics in the DQA are indirect measures of data quality. For example, stability of sensor gain is an important data quality attribute. However, we are not able to make direct measurements of the gain remotely without running calibrations and must rely on well-formulated metrics to identify changes on time scales shorter than the annual calibration schedule. Similarly, one of the limitations of traditional waveform review is that subtle changes in noise levels or response are

Data quality assessment

One of the motivations for the development of the DQA is the desire to quantify data quality. Data quality is notoriously difficult to define and depends largely on the problem to be solved or the user's intended application of the data. For the GSN, data quality assessments (e.g., (http://www.iris.edu/hq/programs/gsn/quality)) are typically conducted after large earthquakes and are very qualitative in nature. One of the few quantitative measures used is availability, which is a performance

Discussion and future work

We have completed the initial phase of the DQA development, with 11 metrics implemented out of 18 planned (Table 1). This first set of metrics provides a proof of concept of the DQA and demonstrates its applicability to data quality analysis on regional, national, and global networks.

Priorities to further develop the DQA involve the completion of remaining metrics, the improvement of the existing interface by improving the plotting features, the expansion of metrics to higher frequencies, and

Conclusions

To complement the efforts to develop a clear set of data quality goals and to distribute information on instrument quality in XML, the DQA is designed to enhance the ability of the ASL to identify and communicate data quality issues. The DQA supplements traditional waveform review and provides new capabilities to characterize data quality using multiple different data quality metrics together. The DQA is designed to be flexible for adding new metrics and is portable for use by network operators

Acknowledgments

We thank Benjamin Marshall, Leo Sandoval, and Tyler Storm for feedback on initial versions of the DQA interface as well as for making suggestions on the metrics. We thank Daniel McNamara for useful discussions on developing noise baselines. Finally, we thank Kent Anderson, Pete Davis, and Mary Templeton for useful discussions regarding various metrics and how best to implement them. We thank Robert Casey, Charles Hutt, Mouse Reusch, and Mary Templeton for helpful reviews of the manuscript. Any

References (21)

  • Ahern, T., Casey, R., Barnes, D., Benson, R., Knight, T., Trabant, C., 2009. SEED Reference Manual, version 2.4,...
  • Apache Software Foundation, 2011. Apache Commons Forest Hills, MD 〈http://commons.apache.org/〉 (accessed...
  • J. Berger et al.

    Ambient earth noise: a survey of the global seismographic network

    J. Geophys. Res.

    (2004)
  • H.P. Crotwell et al.

    The TauP Toolkit: Flexible seismic travel-time and ray-path utilities

    Seismol. Res. Lett.

    (1999)
  • Crotwell, H.P., 2002. SeedCodec, 〈http://www.seis.sc.edu/downloads/seedCodec/〉, (accessed...
  • Earle, P.S., Wald, D.J., Jaiswal, K.S., Allen, T.I., Hearne, M.G., Marano, K.D., Hotovec, A.J., Fee, J.M., 2009. Prompt...
  • G. Ekström et al.

    Observations of time-dependent errors in long-period instrument gain at global seismic stations

    Seismol. Res. Lett.

    (2006)
  • C.R. Hutt et al.

    Some possible causes of and corrections for STS-1 response changes in the Global Seismographic Network

    Seismol. Res. Lett.

    (2011)
  • A. Lomax

    Java for seismologists

    (2000)
  • Lomax, A., 2014. Software for observation, analysis, and understanding of seismological information, ALomax Scientific,...
There are more references available in the full text version of this article.

Cited by (0)

1

Now at Instrumental Software Technologies, Inc., P.O. Box 963, New Paltz, NY 12561, USA.

View full text