Elsevier

Applied Soft Computing

Volume 67, June 2018, Pages 910-919
Applied Soft Computing

Measuring traffic congestion: An approach based on learning weighted inequality, spread and aggregation indices from comparison data

https://doi.org/10.1016/j.asoc.2017.07.014Get rights and content

Highlights

  • We formulate the parameter (weight) learning problems for weighted spread and inequality related functions.

  • We showcase an application of these ideas to the analysis and identification of traffic congestion.

  • We use real data as a means for demonstrating our techniques.

Abstract

As cities increase in size, governments and councils face the problem of designing infrastructure and approaches to traffic management that alleviate congestion. The problem of objectively measuring congestion involves taking into account not only the volume of traffic moving throughout a network, but also the inequality or spread of this traffic over major and minor intersections. For modeling such data, we investigate the use of weighted congestion indices based on various aggregation and spread functions. We formulate the weight learning problem for comparison data and use real traffic data obtained from a medium-sized Australian city to evaluate their usefulness.

Graphical abstract

Comparison of smoothed time series data for velocity, volume, weighted spread (using proportions calculated from relative volume) and an adjusted volume index.

  1. Download : Download high-res image (76KB)
  2. Download : Download full-size image

Introduction

When making decisions based on sets of numerical data, as data analysts we will usually be interested in summaries that help us identify trends, central tendency, spread, etc., and which can be used to make objective comparisons. Classical operators such as the arithmetic mean and median have been recognized as special cases of much broader families of aggregation functions, which have been studied in depth in the areas of decision-making and fuzzy systems [7], [27], [35]. However, in many contexts there is also the need for a more dedicated study of summaries that indicate the variation or spread of data [23]. In particular, we have measures which evaluate the level of income inequality in economics [4], [20], [26], the evenness of species distributions in ecology [2], [32], [34], and disagreement between experts in group decision making [3], [8], [17], [1], [24].

Traffic analysis is a topic of interest across various research fields, with the core problems of understanding, modeling, predicting and reducing traffic congestion being useful not only in terms of efficient infrastructure and logistics, but also in terms of environmental impact. Traffic and network simulation has been useful in understanding different theories on flow (e.g. [29]) and also in predicting the impact of certain control measures, e.g. charging for entering central business zones [37]. Real data is either obtained from stationary sensors (cameras, vehicle detectors, etc.) [19], [39], [41] or, more recently, from GPS trajectories based on devices embedded in vehicles (especially vehicles such as taxis [36], where privacy concerns are not considered as relevant).

Decision makers and most real-time automated systems in traffic management rely on traffic volume data (e.g. see [31]) that counts the number of cars passing through an intersection over a given time interval (although across some freeway networks there may also be average speeds available [19]). Rather than volume, road users, council decision makers and traffic managers will usually be interested in the level of congestion experienced across a given region or large network. Whereas volume can be measured objectively, the notion of congestion is somewhat more difficult to define. It has been approached as a binary classification task in [39], i.e. where an intersection is considered to be congested when the volume exceeds a certain threshold, while in [21] an expert system was proposed that distinguishes between incidents and congestion. We are interested in developing reliable indices of congestion given over a continuous scale so that the impact of potential improvements to the network, e.g. from road work, new highways, changes to traffic light sequences, etc., can be measured. By being able to objectively measure congestion in a way that reflects the road-user experience (i.e. traffic jams and slower travel speeds), decision makers can then consider how best to reduce congestion.

Periods of high volume certainly will often correspond with drivers experiencing high levels of congestion, however in terms of the number of cars passing through an intersection, low counts can also be indicative of high congestion. While in [39] congestion prediction problem was approached as one of feature selection in the presence of correlated variables, intuitively we can recognize that the function behavior we are interested in is one whose output can tell us when, in a local area of the network, large intersections have counts below their capacity while other intersections are all busier than normal. For this we turn to inequality, spread and consensus functions, all of which provide summaries of a dataset's variation, although from slightly different perspectives.

In this contribution, we will consider weighted versions of these indices and functions toward their practical application in measuring congestion and more broadly for decision making applications. As an illustrative example of their use, we will use a subset of traffic data obtained from Brisbane City Council (in Australia) measuring the volume of traffic passing through various intersections over 5-minute intervals.

We will organize our contribution according to the following structure. In the Preliminaries section, we will give the necessary background and formulas for aggregation functions, inequality functions, spread measures, and consensus measures. In Section 3, we formulate the linear programming approaches required for fitting our simple congestion metrics to data. In Section 4, we use the traffic volume and median velocities for learning weights and measuring congestion across a subset of the network. We look at the performance of each measure in terms of Spearman correlation [33] both in fitting and for use in prediction. In Section 5, we will provide some discussion and outline some avenues for future research, before providing concluding remarks in Section 6.

Section snippets

Preliminaries

We consider the topics of aggregation, inequality, spread and consensus in the context of measuring traffic congestion, where inputs will usually relate to a set of intersection volumes, however these are of course also relevant to multi-criteria evaluation and decision making in general.

Learning spread measure and inequality weights from comparison data

To use weighted functions for assessing the level of congestion, we need a way of choosing the appropriate weights. Functions such as the Gini index are formulated with respect to fixed (equal) weights, however in our context it is likely the case that some intersections will have a higher influence on the level of congestion, e.g. due to the topology of the road network.

We assume datasets consisting of m × n matrices, where each row xk = (xk,1, xk,2, …, xk,n) denotes the number of cars passing

Evaluation of various metrics on the Brisbane traffic dataset

We obtained volumetric data (number of cars passing through an intersection over a 5-min interval observed using vehicle detecting road sensors) and median velocity data (extracted from blue-tooth GPS data for a sample of 3000 cars over the entire network) from the Brisbane City Council for 8 weekdays from September 5 to September 14,1

Discussion and future work

Here we have proposed methods for learning weighted spread and inequality indices and validated their potential using a small real-world dataset. In the process, a number of potential improvements that could increase the performance have been identified, although we note that the best functions and parameters to use will vary from dataset to dataset, and that the learning mechanism may need to be adjusted depending on the observed ‘true’ evaluations of congestion.

In our case, median velocity

Conclusion

We investigated a practical application of inequality and spread measures and proposed methods for learning the weights of such functions from comparison data. We investigated the performance of such techniques when modeling congestion based on counts of traffic passing through multiple intersections throughout a city's road network. We found that although volume and weighted volume were, in general, more reliable than inequality indices, metrics that combined the two provided even better

Acknowledgement

The authors would like to acknowledge Brisbane City Council and GCS Agile for providing the data used for our experiments.

References (41)

  • K. Wu et al.

    Traffic and emissions impact of congestion charging in the central Beijing urban area: a simulation analysis

    Transp. Res. D

    (2017)
  • S. Yang

    On feature selection for traffic congestion prediction

    Transp. Res. C

    (2013)
  • X. Zhang et al.

    Hierarchical fuzzy rule-based system optimized with genetic algorithms for short term traffic congestion prediction

    Transp. Res. C

    (2014)
  • R.V. Alatalo

    Problems in the measurement of evenness in ecology

    Oikos

    (1981)
  • J. Alcalde-Unzu et al.

    Measuring consensus: concepts, comparisons and properties

    Consensual Processes, STUDFUZZ, vol. 267

    (2011)
  • O. Aristondo et al.

    The G ini index, the dual decomposition of aggregation functions, and the consistent measurement of inequality

    Int. J. Intel. Syst.

    (2012)
  • G. Beliakov et al.

    A Practical Guide to Averaging Functions

    (2015)
  • G. Beliakov et al.

    Penalty-based and other representations of economic inequality

    Int. J. Uncertain. Fuzziness Knowl. Based Syst.

    (2016)
  • G. Beliakov et al.

    Learning aggregation weights from 3-tuple comparison sets

  • D.J. Best et al.

    Algorithm AS 89: the upper tail probabilities of Spearman's rho

    Appl. Stat.

    (1975)
  • Cited by (11)

    • Estimating congestion zones and travel time indexes based on the floating car data

      2021, Computers, Environment and Urban Systems
      Citation Excerpt :

      Regarding the congestion estimation on the micro level, several approaches used basic statistical measures of speed or density to estimate traffic congestion (D'Andrea & Marcelloni, 2017; He et al., 2016; Sun et al., 2019). Some approaches used higher complexity statistical models to estimate congestion: particle swarm optimization coupled with fuzzy module and saturation, density, and speed as traffic flow parameters (Kong et al., 2016); Multiple Data Estimation (MDE) model with density, velocity, inflow, and previous status parameters (Yang et al., 2017); and learning weighted inequality and spread indexes (Beliakov et al., 2018). Several approaches considered spatio-temporal correlations when estimating congestion: closely time-related dynamic neighborhoods of traffic flow (Shi et al., 2018), spatio-temporal connectivity of trajectories on the turn level (Kan et al., 2019), and links' Speed Transition Matrices (STMs) and Markov chain procedure (Tišljarić et al., 2020).

    • A generalization of stability for families of aggregation operators

      2020, Fuzzy Sets and Systems
      Citation Excerpt :

      Strict stability (or self identity property under symmetry) [24,27] produces a natural kind of robustness within a FAO, since the aggregation operators of a strict stable FAO are forced to hold some transversal continuity between aggregation operators of different cardinality, assuring a specific consistency no matter if the cardinality of the aggregation process changes. This is why such stability has been considered in different contexts, as weights determination for weighted average mean [11], missing data problems [2,3,16,17,26] or the development of contextual indexes [25], among others. The problem of assuring some kind of consistency for the operators within a FAO is in our opinion a key issue in aggregation.

    • Constructing the geometric Bonferroni mean from the generalized Bonferroni mean with several extensions to linguistic 2-tuples for decision-making

      2019, Applied Soft Computing Journal
      Citation Excerpt :

      Ideas and methods from both algebraic and geometric perspectives are extremely valuable in this context [1,7,8]. In particular, such approaches have led to significant improvements in the ability of aggregation functions to model a variety of information fusion applications, including, among others, railway risk information processing [9], multi-attribute utility analysis [10], sustainability assessment [11,12], multi-attribute decision rules [13], species diversity modeling [14], traffic congestion measurement [15], image processing [16,17], social network analysis [18], recommendation systems [19], and water quality assessment [20]. An aggregation function can be classified as conjunctive, disjunctive, averaging, or hybrid, and, in the context of multiple-attribute (group) decision-making (MADM/MAGDM), provides an effective indicator that can be used to gauge the performance of alternatives [21].

    • Design of a heuristic environment-friendly road pricing scheme for traffic emission control under uncertainty

      2019, Journal of Environmental Management
      Citation Excerpt :

      However, design of such a toll scheme may be complicated when vehicle emissions are characterized by uncertainties. In addition, the traffic indicators used for congestion assessment lack an effective link to the environment-friendly toll strategies in the real-world transportation system management practice (Beliakov et al., 2018; Younes and Boukerche, 2015). These existing issues place the problems of the pricing scheme design beyond the conventional optimization approaches, which should be considered.

    View all citing articles on Scopus
    View full text