Elsevier

Information Systems

Volume 38, Issue 8, November 2013, Pages 1234-1251
Information Systems

CIRCE: Correcting Imprecise Readings and Compressing Excrescent points for querying common patterns in uncertain sensor streams

https://doi.org/10.1016/j.is.2012.01.003Get rights and content

Abstract

Continuous sensor stream data are often recorded as a series of discrete points in a database from which knowledge can be retrieved through queries. Two classes of uncertainties inevitably happen in sensor streams that we present as follows. The first is Uncertainty due to Discrete Sampling (DS Uncertainty); even if every discrete point is correct, the discrete sensor stream is uncertain – that is, it is not exactly like the continuous stream – since some critical points are missing due to the limited capabilities of the sensing equipment and the database server. The second is Uncertainty due to Sampling Error (SE Uncertainty); sensor readings for the same situation cannot be repeated exactly when we record them at different times or use different sensors since different sampling errors exist. These two uncertainties reduce the efficiency and accuracy of querying common patterns. However, already known algorithms generally only resolve SE Uncertainty. In this paper, we propose a novel method of Correcting Imprecise Readings and Compressing Excrescent (CIRCE) points. Particularly, to resolve DS Uncertainty, a novel CIRCE core algorithm is developed in the CIRCE method to correct the missing critical points while compressing the original sensor streams. The experimental study based on various sizes of sensor stream datasets validates that the CIRCE core algorithm is more efficient and more accurate than a counterpart algorithm to compress sensor streams. We also resolve the SE Uncertainty problem in the CIRCE method. The application for querying longest common route patterns validates the effectiveness of our CIRCE method.

Introduction

With advances in satellite, Radio-Frequency IDentification (RFID), Global Positioning System (GPS), wireless and video technologies, sensor stream database systems that manage a time series of sensor readings are becoming increasingly more available. Continuous sensor stream data are often recorded as a series of discrete points in databases [1], where useful knowledge can be achieved through queries. Querying sensor stream databases has a wide range of applications, such as monitoring locations of moving objects (flocks [2], vehicles [3], [4], [5], [6], cloud clusters [7], fleets [8]), surveillance of environmental physical parameters (e.g., temperature [9], humidity, etc.) and the automatic control of robots through vision, sound and radio sensors.

However, sensor streams are inevitably uncertain and thus data mining over uncertain sensor streams has become a hot research topic [11]. Different from approximate average (sum) queries of uncertain stream data [10], [9], nearest neighbor query on uncertain sensor streams [9], top-k queries over uncertain data in Peer-to-Peer (P2P) networks [12] and clustering uncertain sensor streams [11], we are more interested in querying common patterns from multiple sensor streams. Typical examples of common patterns are convoy [5], the spatiotemporal sequential pattern [13], the trajectory pattern [6] and the longest common route (LCR) pattern [14], explained as follows:

  • Convoy is defined in [5] as a group of objects which travel together for at least a given time span.

  • Spatiotemporal sequential pattern [13] is a Sequential PAttern (SPA) of route segments, where each route segment is visited by at least min_sup (minimal number of supports) objects.

  • Trajectory pattern [6] is a set of objects traveling a common route with similar sequences of durations, where a route is denoted by a sequence of popular regions.

  • Longest common route (LCR) pattern [14] is a route visited by the same list of at least min_sup moving objects, where the route is denoted by a sequence of turning regions.

We compared the four common patterns in Table 1. We can see that all the common patterns must satisfy that the same sequence of places are visited by at least min_sup moving objects, and thus spatial similarity may be the most important factor to determine a common pattern. But the problem of determining whether two objects visit the same place is a challenge due to uncertain spatial points—different locations sampled at the same place by different objects.

Therefore, the goal of this paper is to tackle the uncertainty problems in trajectories to effectively query common patterns. Querying longest common route patterns as a typical example can satisfy this goal and details are presented in Section 5. In our vision, there are the following two types of uncertainties in sensor streams that impact the efficient and accurate querying of the above-mentioned common patterns.

  • Uncertainty due to Discrete Sampling (DS Uncertainty). Even if every discrete point is correct, the discrete points on the time series are uncertain—that is, they are not exactly like a continuous stream since some critical points are missing due to the limited capabilities of the sensing equipment and the database server [15]. On one hand, a large amount of excrescent points exist and thus storage resources are wasted, which also reduces the efficiency of the user query. On the other hand, missing critical points makes the users' query inaccurate.

  • Uncertainty due to Sampling Error (SE Uncertainty). Sensor readings of the same situation cannot be repeated exactly when we record them at different times or use different sensors. The trajectories of different moving objects [13] are different, even if the objects move along the same route. For example, two moving objects passed the same region at time t0, however, the GPS sensors record two different location values: r1(loc(−37.6934, 144.7931), t0) and r2(loc(−37.6935, 144.7931), t0).

Many methods have been proposed to overcome Uncertainty due to Sampling Error. But, to the best of our knowledge, no related work tackles the Uncertainty due to Discrete Sampling problem, e.g., correcting the missing critical points.

We summarize the already known methods to remove Uncertainty due to Sampling Error in the last column in Table 1. These are also the methods that have been developed to discover corresponding common patterns in the first column in the same row. We explain the four methods as follows:

  • The mechanism of a solution based on clustering points is to tolerate a bounded error [5]. That is, the Euclid distance between r1 and r2 (in above example) is 0.0001, and we suppose that it is smaller than a threshold, ε=0.0002, so, r1 and r2 are taken as being in the same place. Clustering is used to group close points into clusters, and points in the same cluster are taken as being in the same place. Then, to determine whether two trajectories are the same, it needs to check if every location pairs are in the bounded error.

  • The second method is to cluster direct line segments and then to determine whether those line segments are the same if they are in a bounded rectangle [13].

  • The third solution is to discover Regions of Interest (RoI), e.g., popular regions such as intersections, and summarize original trajectories by using RoI IDs [6].

  • The fourth solution provided in our previous work [14] is to discover turning regions by mainly clustering points, appended by clustering direct line segments, and to simplify trajectories using turning region IDs.

However, all of these methods cannot process trajectories with missing critical points. An example is given in Fig. 1(a), where two trajectories along the same route cannot be taken as being in a common route pattern by the above four methods due to a missing turning point (or inflexion) near to Region B. Note that the concepts of inflexions and turning points can be used interchangeably in this paper.

In this paper, we propose a novel Correcting Imprecise Readings and Compressing Excrescent points (CIRCE) method to resolve both DS Uncertainty and SE Uncertainty problems, which aims to achieve efficient and accurate queries of common patterns from sensor streams. We now present the main idea of the CIRCE method. To resolve the problem of Uncertainty due to Discrete Sampling, we develop a CIRCE core algorithm, which comprises two main procedures: the Detecting Inflexions and Computing Missing Inflexions (DICMI) procedure and the Angle-DP procedure. The DICMI procedure detects local inflexions including missing inflexions as inflexion candidates based on local consecutive points on original sensor streams. The Angle-DP procedure tests whether an inflexion candidate is a global one and thus removes false inflexion candidates. Then, we group inflexions into clusters, take the inflexions in the same cluster as being in the same region to tackle the problem of Uncertainty due to Sampling Error, and finally use sequences of cluster IDs to compress sensor streams. Thus, common patterns can be queried using query methods developed for exact data, such as querying longest common substrings (LCS). Finally, we develop an innovative efficient procedure of Discovering Implicit Semantic Places (DISP) to ensure the accuracy of querying common patterns directly from cluster ID sequences.

The contributions of this paper include the following:

  • The CIRCE core algorithm is one of the major contributions of this paper, since it corrects the missing critical points as shown in Fig. 1(b). This is different from already known algorithms for data stream compressing that only can compress redundant data but cannot correct missing data. Fig. 1(b) shows that correcting missing inflexions helps improve query accuracy. The most related work, the Douglas–Peucker (DP) [16] algorithm, compresses the trajectories to reduce the enormous volume of data [15], [17], [13], but it only removes uncritical points and it cannot correct missing points.

  • Moreover, experimental study of this paper demonstrates that the CIRCE core algorithm-based query of common patterns is more accurate and efficient than the DP-based methods. Interestingly, by correcting the missing inflexions, the CIRCE method uses less inflexions to compress a sensor stream, and thus achieves higher efficiency to group less inflexions into clusters.

  • Compared to queries on original sensor streams, the advantages of our CIRCE method are: (1) improving query quality and (2) realizing highly efficient queries. In the experimental study, we take the querying of longest common route (LCR) patterns from various sizes of sensor stream datasets as an example to validate the accuracy and efficiency of our CIRCE method.

The rest of this paper is organized as follows. Firstly, we present related work in Section 2. Then we present an overview of our solution in Section 3. The CIRCE method is detailed in Section 4. In Section 5, we introduce querying common patterns based on the CIRCE method. The performance of the proposed method is evaluated in Section 6 and finally, Section 7 concludes the paper.

Section snippets

Related work

In this section, we present a survey of related work. We first introduce already known methods for trajectory simplification that are used to remove excrescent points. Then, we introduce related solutions to resolve the Uncertainty due to Sampling Error problem. To the best of our knowledge, no related work tackles the Uncertainty due to Discrete Sampling problem, e.g., correcting missing critical points.

Overview of our solution

In this section, we formally define basic terms and present the overview of our solution.

Correcting Imprecise Readings and Compressing Excrescent (CIRCE) points method

In this section, we introduce three major parts of CIRCE: the CIRCE core algorithm, the algorithm of discovering semantic places and the Discovering Implicit Semantic Places (DISP) procedure.

Querying common patterns from uncertain sensor streams based on the CIRCE method

In this section, we first introduce our CIRCE package, an implementation system of the CIRCE method, and then we present querying longest common route patterns from sensor streams as a typical application.

Performance evaluations

In our experiments, we use the application of querying LCR patterns to validate the efficiency and accuracy of our CIRCE method.

Conclusions

In conclusion, we proposed a novel CIRCE method to enhance the efficiency and accuracy of querying common patterns from uncertain sensor streams. The major contribution of the CIRCE method is to tackle Uncertainty due to Discrete Sampling (DS Uncertainty) and Uncertainty due to Sampling Error (SE Uncertainty). To resolve the DS Uncertainty, a novel CIRCE core algorithm was developed in the CIRCE method to correct the missing points while compressing the original sensor streams. The experimental

References (39)

  • J.-G. Lee, J. Han, X. Li, H. Gonzalez, TraClass: trajectory classification using hierarchical region-based and...
  • A. Cuzzocrea, Providing probabilistically-bounded approximate answers to non-holistic aggregate range queries in OLAP,...
  • C. Zhang, M. Gao, A. Zhou, Tracking high quality clusters over uncertain data streams, in: ICDE2009, 2009, pp....
  • Y. Sun, Y. Yuan, G. Wang, Top-k query processing over uncertain data in distributed environments, World Wide Web,...
  • H. Cao, N. Mamoulis, D.W. Cheung, Mining frequent spatiotemporal sequential patterns, in: ICDM'05, 2005, pp....
  • G. Huang, Y. Zhang, J. He, Efficiently retrieving longest common route patterns of moving objects by summarizing...
  • N. Meratnia et al.

    Spatiotemporal compression techniques for moving point objects

  • D.H. Douglas et al.

    Algorithms for the reduction of the number of points required to represent a digitized line or its caricature

    The Canadian Cartographer

    (1973)
  • K. Thapa

    Data compression and critical points detection using normalized symmetric scattered matrix

    Autocarto

    (1989)
  • Cited by (7)

    View all citing articles on Scopus
    View full text