Measuring traffic congestion: An approach based on learning weighted inequality, spread and aggregation indices from comparison data
Graphical abstract
Comparison of smoothed time series data for velocity, volume, weighted spread (using proportions calculated from relative volume) and an adjusted volume index.
Introduction
When making decisions based on sets of numerical data, as data analysts we will usually be interested in summaries that help us identify trends, central tendency, spread, etc., and which can be used to make objective comparisons. Classical operators such as the arithmetic mean and median have been recognized as special cases of much broader families of aggregation functions, which have been studied in depth in the areas of decision-making and fuzzy systems [7], [27], [35]. However, in many contexts there is also the need for a more dedicated study of summaries that indicate the variation or spread of data [23]. In particular, we have measures which evaluate the level of income inequality in economics [4], [20], [26], the evenness of species distributions in ecology [2], [32], [34], and disagreement between experts in group decision making [3], [8], [17], [1], [24].
Traffic analysis is a topic of interest across various research fields, with the core problems of understanding, modeling, predicting and reducing traffic congestion being useful not only in terms of efficient infrastructure and logistics, but also in terms of environmental impact. Traffic and network simulation has been useful in understanding different theories on flow (e.g. [29]) and also in predicting the impact of certain control measures, e.g. charging for entering central business zones [37]. Real data is either obtained from stationary sensors (cameras, vehicle detectors, etc.) [19], [39], [41] or, more recently, from GPS trajectories based on devices embedded in vehicles (especially vehicles such as taxis [36], where privacy concerns are not considered as relevant).
Decision makers and most real-time automated systems in traffic management rely on traffic volume data (e.g. see [31]) that counts the number of cars passing through an intersection over a given time interval (although across some freeway networks there may also be average speeds available [19]). Rather than volume, road users, council decision makers and traffic managers will usually be interested in the level of congestion experienced across a given region or large network. Whereas volume can be measured objectively, the notion of congestion is somewhat more difficult to define. It has been approached as a binary classification task in [39], i.e. where an intersection is considered to be congested when the volume exceeds a certain threshold, while in [21] an expert system was proposed that distinguishes between incidents and congestion. We are interested in developing reliable indices of congestion given over a continuous scale so that the impact of potential improvements to the network, e.g. from road work, new highways, changes to traffic light sequences, etc., can be measured. By being able to objectively measure congestion in a way that reflects the road-user experience (i.e. traffic jams and slower travel speeds), decision makers can then consider how best to reduce congestion.
Periods of high volume certainly will often correspond with drivers experiencing high levels of congestion, however in terms of the number of cars passing through an intersection, low counts can also be indicative of high congestion. While in [39] congestion prediction problem was approached as one of feature selection in the presence of correlated variables, intuitively we can recognize that the function behavior we are interested in is one whose output can tell us when, in a local area of the network, large intersections have counts below their capacity while other intersections are all busier than normal. For this we turn to inequality, spread and consensus functions, all of which provide summaries of a dataset's variation, although from slightly different perspectives.
In this contribution, we will consider weighted versions of these indices and functions toward their practical application in measuring congestion and more broadly for decision making applications. As an illustrative example of their use, we will use a subset of traffic data obtained from Brisbane City Council (in Australia) measuring the volume of traffic passing through various intersections over 5-minute intervals.
We will organize our contribution according to the following structure. In the Preliminaries section, we will give the necessary background and formulas for aggregation functions, inequality functions, spread measures, and consensus measures. In Section 3, we formulate the linear programming approaches required for fitting our simple congestion metrics to data. In Section 4, we use the traffic volume and median velocities for learning weights and measuring congestion across a subset of the network. We look at the performance of each measure in terms of Spearman correlation [33] both in fitting and for use in prediction. In Section 5, we will provide some discussion and outline some avenues for future research, before providing concluding remarks in Section 6.
Section snippets
Preliminaries
We consider the topics of aggregation, inequality, spread and consensus in the context of measuring traffic congestion, where inputs will usually relate to a set of intersection volumes, however these are of course also relevant to multi-criteria evaluation and decision making in general.
Learning spread measure and inequality weights from comparison data
To use weighted functions for assessing the level of congestion, we need a way of choosing the appropriate weights. Functions such as the Gini index are formulated with respect to fixed (equal) weights, however in our context it is likely the case that some intersections will have a higher influence on the level of congestion, e.g. due to the topology of the road network.
We assume datasets consisting of m × n matrices, where each row xk = (xk,1, xk,2, …, xk,n) denotes the number of cars passing
Evaluation of various metrics on the Brisbane traffic dataset
We obtained volumetric data (number of cars passing through an intersection over a 5-min interval observed using vehicle detecting road sensors) and median velocity data (extracted from blue-tooth GPS data for a sample of 3000 cars over the entire network) from the Brisbane City Council for 8 weekdays from September 5 to September 14,1
Discussion and future work
Here we have proposed methods for learning weighted spread and inequality indices and validated their potential using a small real-world dataset. In the process, a number of potential improvements that could increase the performance have been identified, although we note that the best functions and parameters to use will vary from dataset to dataset, and that the learning mechanism may need to be adjusted depending on the observed ‘true’ evaluations of congestion.
In our case, median velocity
Conclusion
We investigated a practical application of inequality and spread measures and proposed methods for learning the weights of such functions from comparison data. We investigated the performance of such techniques when modeling congestion based on counts of traffic passing through multiple intersections throughout a city's road network. We found that although volume and weighted volume were, in general, more reliable than inequality indices, metrics that combined the two provided even better
Acknowledgement
The authors would like to acknowledge Brisbane City Council and GCS Agile for providing the data used for our experiments.
References (41)
- et al.
A review of soft consensus models in a fuzzy environment
Inf. Fusion
(2014) - et al.
Classical inequality indices, welfare and illfare functions, and the dual decomposition
Fuzzy Sets Syst.
(2013) Construction of aggregation functions from data using linear programming
Fuzzy Sets Syst.
(2009)- et al.
Consensus measures constructed from aggregation functions and fuzzy implications
Knowl. Based Syst.
(2014) - et al.
Can indices of ecological evenness be used to measure consensus?
- et al.
Using aggregation functions to model human judgements of species diversity
Inf. Syst.
(2015) - et al.
Using aggregation functions to model human judgements of species diversity
Inf. Sci.
(2015) - et al.
Detection of traffic congestion incidents from GPS trace analysis
Expert Syst. Appl.
(2017) Inequality, poverty and welfare
Spread measures and their relation to aggregation functions
Eur. J. Oper. Res.
(2015)
Traffic and emissions impact of congestion charging in the central Beijing urban area: a simulation analysis
Transp. Res. D
On feature selection for traffic congestion prediction
Transp. Res. C
Hierarchical fuzzy rule-based system optimized with genetic algorithms for short term traffic congestion prediction
Transp. Res. C
Problems in the measurement of evenness in ecology
Oikos
Measuring consensus: concepts, comparisons and properties
Consensual Processes, STUDFUZZ, vol. 267
The G ini index, the dual decomposition of aggregation functions, and the consistent measurement of inequality
Int. J. Intel. Syst.
A Practical Guide to Averaging Functions
Penalty-based and other representations of economic inequality
Int. J. Uncertain. Fuzziness Knowl. Based Syst.
Learning aggregation weights from 3-tuple comparison sets
Algorithm AS 89: the upper tail probabilities of Spearman's rho
Appl. Stat.
Cited by (11)
Structured prediction of sparse dependent variables for traffic state estimation in large-scale networks
2023, Applied Soft ComputingEstimating congestion zones and travel time indexes based on the floating car data
2021, Computers, Environment and Urban SystemsCitation Excerpt :Regarding the congestion estimation on the micro level, several approaches used basic statistical measures of speed or density to estimate traffic congestion (D'Andrea & Marcelloni, 2017; He et al., 2016; Sun et al., 2019). Some approaches used higher complexity statistical models to estimate congestion: particle swarm optimization coupled with fuzzy module and saturation, density, and speed as traffic flow parameters (Kong et al., 2016); Multiple Data Estimation (MDE) model with density, velocity, inflow, and previous status parameters (Yang et al., 2017); and learning weighted inequality and spread indexes (Beliakov et al., 2018). Several approaches considered spatio-temporal correlations when estimating congestion: closely time-related dynamic neighborhoods of traffic flow (Shi et al., 2018), spatio-temporal connectivity of trajectories on the turn level (Kan et al., 2019), and links' Speed Transition Matrices (STMs) and Markov chain procedure (Tišljarić et al., 2020).
A generalization of stability for families of aggregation operators
2020, Fuzzy Sets and SystemsCitation Excerpt :Strict stability (or self identity property under symmetry) [24,27] produces a natural kind of robustness within a FAO, since the aggregation operators of a strict stable FAO are forced to hold some transversal continuity between aggregation operators of different cardinality, assuring a specific consistency no matter if the cardinality of the aggregation process changes. This is why such stability has been considered in different contexts, as weights determination for weighted average mean [11], missing data problems [2,3,16,17,26] or the development of contextual indexes [25], among others. The problem of assuring some kind of consistency for the operators within a FAO is in our opinion a key issue in aggregation.
Constructing the geometric Bonferroni mean from the generalized Bonferroni mean with several extensions to linguistic 2-tuples for decision-making
2019, Applied Soft Computing JournalCitation Excerpt :Ideas and methods from both algebraic and geometric perspectives are extremely valuable in this context [1,7,8]. In particular, such approaches have led to significant improvements in the ability of aggregation functions to model a variety of information fusion applications, including, among others, railway risk information processing [9], multi-attribute utility analysis [10], sustainability assessment [11,12], multi-attribute decision rules [13], species diversity modeling [14], traffic congestion measurement [15], image processing [16,17], social network analysis [18], recommendation systems [19], and water quality assessment [20]. An aggregation function can be classified as conjunctive, disjunctive, averaging, or hybrid, and, in the context of multiple-attribute (group) decision-making (MADM/MAGDM), provides an effective indicator that can be used to gauge the performance of alternatives [21].
Design of a heuristic environment-friendly road pricing scheme for traffic emission control under uncertainty
2019, Journal of Environmental ManagementCitation Excerpt :However, design of such a toll scheme may be complicated when vehicle emissions are characterized by uncertainties. In addition, the traffic indicators used for congestion assessment lack an effective link to the environment-friendly toll strategies in the real-world transportation system management practice (Beliakov et al., 2018; Younes and Boukerche, 2015). These existing issues place the problems of the pricing scheme design beyond the conventional optimization approaches, which should be considered.