Elsevier

Atmospheric Research

Volume 213, 15 November 2018, Pages 450-464
Atmospheric Research

Multi-stage hybridized online sequential extreme learning machine integrated with Markov Chain Monte Carlo copula-Bat algorithm for rainfall forecasting

https://doi.org/10.1016/j.atmosres.2018.07.005Get rights and content

Highlights

  • OS-ELM hybridized with MCMC copula-Bat model is applied for rainfall prediction.

  • Optimal MCMC-Cop-Bat-OS-ELM is based on 25 copulas ranked by Bat algorithm.

  • Optimal model benchmarked with MCMC-Cop-Bat-ELM and a MCMC-Cop-Bat-RF.

  • MCMC-Cop-Bat-OS-ELM yields accurate 1-month lead forecasts.

  • MCMC-Cop-Bat-OS-ELM is a viable predictive model for water management.

Abstract

To ameliorate agricultural impacts due to persistent drought-risks by promoting sustainable utilization and pre-planning of water resources, accurate rainfall forecasting models, addressing the dynamic nature of drought phenomenon, is crucial. In this paper, a multi-stage probabilistic machine learning model is designed and evaluated for forecasting monthly rainfall. The multi-stage hybrid MCMC-Cop-Bat-OS-ELM model utilizing online-sequential extreme learning machines integrated with Markov Chain Monte Carlo (MCMC) based bivariate-copula and the Bat algorithm is employed to incorporate significant antecedent rainfall (t–1) as the model's predictor in the training phase. After computing the partial autocorrelation function (PACF) at the first stage, twenty-five MCMC based copulas (i.e., Gaussian, t, Clayton, Gumble, Frank and Fischer-Hinzmann etc.) are adopted to determine the dependence of antecedent month's rainfall with the current and future rainfall at the second stage of the model design. Bat algorithm is applied to sort the optimal MCMC-copula model by a feature selection strategy at the third stage. At the fourth stage, PACF's of the optimal MCMC-copula model are computed to couple the output with OS-ELM algorithm to forecast future rainfall values in an independent test dataset. As a benchmarking process, standalone extreme learning machine (ELM) and random forest (RF) is also integrated with MCMC based copulas and the Bat algorithm, yielding a hybrid MCMC-Cop-Bat-ELM and a MCMC-Cop-Bat-RF models. The proposed multi-stage hybrid model is tested in agricultural belt region in Faisalabad, Jhelum and Multan, located in Pakistan. The testing performance of all three hybridized models, according to robust statistical error metrics, is satisfactory in comparison to the standalone counterparts, however the multi-stage, hybridized MCMC-Cop-Bat-OS-ELM model is found to be a superior tool for forecasting monthly rainfall. This multi-stage probabilistic learning model can be explored as a pertinent decision-support tool for agricultural water resources management in arid and semi-arid regions where a statistically significant relationship with antecedent rainfall exists.

Introduction

Anthropogenic and naturally-induced anomalies in regional-scale rainfall can directly affect the agricultural sector since rainfall plays a vital role in both the growth and the production of crops (Maraseni et al., 2012; Nguyen-Huy et al., 2018). The effect is not only restricted to the agricultural sector but it also brings major water-related disasters (Barredo, 2007) such as the shortage of rainfall on the long run is leading to drought events (Palmer, 1965). This can lead to water scarcity (Langridge et al., 2006; Vörösmarty et al., 2010) while excessive amounts of rainfall can cause flooding and damage to human and wildlife health, infrastructure and the economy (Bhalme and Mooley, 1980). The economy of Pakistan, a nation that is still in its developing phase, has also been severely damaged due to major flooding events, including the damage to infrastructure and agricultural crops (News, 2010). The estimated damage in the 2010 event to infrastructure was approximately 4 billion US dollars whereas the damage in the agricultural sector amounted to about 500 million US dollars (Hicks and Burton, 2010). The total economic damage was considerably large, totaling to approximately 43 billion US dollars in 2010 (Mansoor, 2010; Tarakzai, 2010). Equally, drought events (Ali et al., 2018) have been a major contributing factor towards reduced agricultural yields and significant reductions of the gross domestic product of Pakistan. Further, prolonged decline of adequate rainfall can cause a fall in hydraulic heads having severe consequences for crop irrigation from wells due to changes in the properties of groundwater reservoirs (Santos et al., 2014). Therefore, the ability to forecast rainfall in an accurate manner, particularly in agricultural belt regions, can increase the ability of stakeholders to formulate better water planning and resource management decisions.

Data-intelligent models, particularly developed for local (e.g., farm) scales, have the ability to utilize past data, and hence may offer a viable and reasonably accurate solution to drought disaster management through a projection of future rainfall (Luk et al., 2001). The study of Chiew et al. (1998) developed data-intelligent, predictive models for rainfall forecasting using an empirical method, whereas Sharma (2000) developed a nonparametric probabilistic model to forecast seasonal to inter-annual rainfall in Australia. Burlando et al. (1993) used an autoregressive moving average (ARMA) model for short-term rainfall forecasting in the USA whereas Hung et al. (2009) applied an artificial neural network (ANN) model for rainfall forecasting in Thailand and Lin et al. (2009) forecasted hourly rainfall using support vector machines for Taiwan. Yaseen et al. (2017) developed a rainfall forecasting model using the novel hybrid intelligent model based adaptive neuro fuzzy inference system(ANFIS) integrated with Firefly algorithm (FFA) for Pahang river catchment located in the Malaysian Peninsula, Mason (1998) forecasted seasonal rainfall of South Africa using a nonlinear discriminant analysis model while Nguyen-Huy et al. (2017) developed a novel copula-statistical rainfall forecasting model in Australia's agro-ecological zones. Accurate rainfall forecasting is a significant challenge for Pakistan due to high variation in seasonal, annual and inter-annual rainfalls, exacerbated by climate change.

Despite the need, only a few studies on rainfall forecasting, particularly at local or regional scales, have been carried out in Pakistan. For example, the study of Salma et al. (2012) forecasted rainfall trends in different climatic zones of Pakistan utilizing the autoregressive integrated moving average (ARIMA) model. Archer and Fowler (2008) applied meteorological data to forecast seasonal runoff on the River Jhelum, Pakistan on the basis of multiple linear regression models. Reale et al. (2012) forecasted an extreme rainfall event (in the Indus River Valley, Pakistan, 2010) with a global data assimilation and forecasting model. Faisal and Gaffar (2012) utilized the Thiessen polygon method of weighted rainfall forecast in Pakistan, whereas the study of Ahasan and Khan (2013) simulated flood producing rainfall events in 2010 over north-west Pakistan using weather research and a forecasting model. These studies have provided immensely useful information to various stakeholder, revealing the capability of data-driven models to generate acceptably accurate rainfall forecasts where only historical datasets were applied to construct the forecast model.

The aforementioned studies (Ahasan and Khan, 2013; Archer and Fowler, 2008; Faisal and Gaffar, 2012; Reale et al., 2012; Salma et al., 2012) focused in Pakistan indicate that rainfall forecasting has been mostly based on statistically-based models. In addition to this, a majority of these studies have been conducted to forecast seasonal rainfall using several different datasets. Moreover, there is a limitation of applying advanced data-intelligent models (considering significantly non-linear behavior of rainfall and its predicators) for accurate forecasting at a micro (or landscape) scale, which can provide help in decision-making for a better management of water resources and flood modelling in the future aimed to reducing the overall risk. For example accurate forecasting is beneficial at catchment scale for agro-forestry applications (Terêncio et al., 2018; Terêncio et al., 2017). Accurate rainfall forecasting can have several economic benefits, for example, a realistic forecast of heavy rainfall could allow airline dispatches to rout their flights in a timely manner (Graham, 2002). In addition to this, a more accurate rainfall forecasting tool might enable appropriate decision about flooding, crop sowing and harvesting and managing of water resources (Graham, 2002; Jones et al., 2000; Toth et al., 2000). To address these issues, there is an apparent need for data intelligent models to forecast rainfall more accurately than the currently statistically-based (i.e., regression) approaches that have various data distribution or linearity assumptions.

In this study, for the first time, a multi-stage online sequential extreme learning machine (OS-ELM) model integrated with Markov Chain Monte Carlo (MCMC) based copulas and the Bat algorithm is developed, denoted as the “MCMC-Cop-Bat-OS-ELM model”. For the purpose of comparison, the standalone extreme learning machine (ELM) without any hybridization and the random forest (RF) models are also developed. The proposed multi-stage MCMC-Cop-Bat-OS-ELM model is tested for rainfall forecasting in three agricultural districts: Faisalabad, Multan, and Jhelum located in Pakistan. The novelty of this study is therefore, to design and apply the newly proposed multi-stage, hybrid MCMC-Cop-Bat-OS-ELM model for rainfall forecasting in Pakistan, a developing nation where accurate predictions are likely to promulgate significant benefits to agriculture, climate adaptation and decision-making in the water resources sector.

To test the applicability of the proposed multi-stage MCMC-Cop-Bat-OS-ELM model, this study fulfils four objectives: (1) To develop a probabilistic MCMC based copula model integrated with the Bat algorithm in order to determine the optimal MCMC-copula model; (2) To incorporate the selected optimal MCMC-copula model based on the Bat algorithm in the OS-ELM model to develop a multi-stage MCMC-Cop-Bat-OS-ELM hybrid prediction tool; (3) To incorporate the significant antecedent lagged rainfall to effectively forecast the current and future rainfall in the consequent month; and (4) To validate the forecasting ability of the proposed hybrid MCMC-Cop-Bat-OS-ELM model for rainfall forecasting in Pakistan.

The literature on accurate rainfall forecasting shows that several approaches were adopted using data intelligent models.

Section snippets

Previous work

Accurate rainfall forecasting provides a key role in agriculture, water resources and early flooding warning systems (Yaseen et al., 2018, Yaseen et al., 2017, Yaseen et al., 2016). Ortiz-García et al. (2014) used support vector classifiers to forecast rainfall in Spain using meteorological variables and observational data to forecast rainfall in Spain using support vector classifiers in comparison with multi-layer perceptron, extreme learning machine, decision trees and K-nearest neighbor

Online sequential extreme learning machine (OS-ELM)

ELM is a state-of-the-art data intelligent model developed by Huang et al. (2006) used for the purpose of designing a Single Layer Feedforward Neural Network (SLFN). ELM is relatively faster, and thus computationally efficient compared with other traditional learning algorithms (Rajesh and Prakash, 2011; Deo and Şahin, 2015; Deo et al., 2017). The SLFN with M hidden nodes of N arbitrary inputs (xk, yk) ∈ Γn × Γn with an activation function f(.) can be mathematically formulated as:i=1Mρifxkciwi=y

Rainfall data

In this paper, we use the rainfall data obtained from the Pakistan Meteorological Department, Pakistan for the year 1981 to 2015 (PMD, 2016) for the selected regions, Faisalabad, Multan, Jhelum in Punjab, as shown in Fig. 1.

To evaluate the versatility of the multi-stage, hybridized MCMC-Cop-Bat-OS-ELM model for rainfall forecasting in Pakistan's agricultural belt, the study sites were chosen carefully to ensure that they were broadly representative of the diverse climatic conditions. The first

Results

The results of the MCMC-Cop-Bat-OS-ELM with the comparative models, MCMC-Cop-Bat-ELM and MCMC-Cop-Bat-RF have been evaluated based on the above criterion (Eqs. (19), (20), (21), (22), (23), (24), (25), (26), (27), (28), (29), (30)). Table 3 shows the selected MCMC-copula model using the Bat algorithm on the basis of feature selection. Out of a total of twenty-five tested MCMC-copula models, seven were selected to be the best by the Bat algorithm for Faisalabad, eight for Jhelum and seven for

Discussion: limitations and opportunity for further research

Accurate rainfall forecasting can complement and facilitate better planning of water management (Terêncio et al., 2018; Ali et al., 2018; Yaseen et al., 2016, Yaseen et al., 2017, Yaseen et al., 2018). (Terêncio et al., 2018). Furthermore, accurate predictions of rainfall can reduce water-related natural disasters (Barredo, 2007; Langridge et al., 2006; Palmer, 1965; Vörösmarty et al., 2010), and potential impacts upon wildlife health, infrastructure and the economy (Bhalme and Mooley, 1980).

Conclusion

For the first time, this paper has developed a hybrid multi-stage MCMC- Cop-Bat-OS-ELM model using the significant antecedent lags of monthly rainfall as predictor variables to forecast future rainfall for different geographical sites in Pakistan. The rainfall data from 1981 to 2015 for a total of three stations were used to develop the proposed multi-stage MCMC- Cop-Bat-OS-ELM model in order to achieve a high level of accuracy. Further, several types of evaluation criterion were adopted to

Acknowledgement

This research utilized rainfall data acquired from Pakistan Meteorological Department, Pakistan, which are duly acknowledged. This study was supported by University of Southern Queensland USQPRS (2017-2019) Office of Graduate Studies Postgraduate Research Scholarship (2017–2019) awarded to the first author. We thank both reviewers and the Editor-in-Chief for their constructive comments that has improved the clarity of the final paper.

References (114)

  • G.-B. Huang et al.

    Extreme learning machine: theory and applications

    Neurocomputing

    (2006)
  • J.W. Jones et al.

    Potential benefits of climate forecasting to agriculture

    Agric. Ecosyst. Environ.

    (2000)
  • T. Kashiwao

    A neural network-based local rainfall prediction system using meteorological data on the internet: a case study using data from the Japan meteorological agency

    Appl. Soft Comput.

    (2017)
  • Y. Lan et al.

    Ensemble of online sequential extreme learning machine

    Neurocomputing

    (2009)
  • K.C. Luk et al.

    An application of artificial neural networks for rainfall forecasting

    Math. Comput. Model.

    (2001)
  • T.N. Maraseni et al.

    Integrated analysis for a carbon-and water-constrained future: an assessment of drip irrigation in a lettuce production system in eastern Australia

    J. Environ. Manag.

    (2012)
  • S. Moazami et al.

    Uncertainty analysis of bias from satellite rainfall estimates using copula method

    Atmos. Res.

    (2014)
  • K. Mohammadi

    A new hybrid support vector machine–wavelet transform approach for estimation of horizontal global solar radiation

    Energy Convers. Manag.

    (2015)
  • J.E. Nash et al.

    River flow forecasting through conceptual models part I—A discussion of principles

    J. Hydrol.

    (1970)
  • T. Nguyen-Huy et al.

    Copula-statistical precipitation forecasting model in Australia's agro-ecological zones

    Agric. Water Manag.

    (2017)
  • T. Nguyen-Huy et al.

    Modeling the joint influence of multiple synoptic-scale, climate mode indices on Australian wheat yield using a vine copula-based approach

    Eur. J. Agron.

    (2018)
  • E. Ortiz-García et al.

    Accurate precipitation prediction with support vector classifiers: a study including novel predictive variables and observational data

    Atmos. Res.

    (2014)
  • S. Salcedo-Sanz et al.

    An efficient neuro-evolutionary hybrid modelling mechanism for the estimation of daily global solar radiation in the Sunshine State of Australia

    Appl. Energy

    (2018)
  • J. Sánchez-Monedero et al.

    Simultaneous modelling of rainfall occurrence and amount using a hierarchical nominal–ordinal support vector classifier

    Eng. Appl. Artif. Intell.

    (2014)
  • R. Santos et al.

    The impact of climate change, human interference, scale and modeling uncertainties on the estimation of aquifer properties and river flow components

    J. Hydrol.

    (2014)
  • A. Sharma

    Seasonal to interannual rainfall probabilistic forecasts for improved water supply management: part 3—a nonparametric probabilistic forecast model

    J. Hydrol.

    (2000)
  • D. Terêncio et al.

    Improved framework model to allocate optimal rainwater harvesting sites in small watersheds for agro-forestry uses

    J. Hydrol.

    (2017)
  • D. Terêncio et al.

    Rainwater harvesting in catchments for agro-forestry uses: a study focused on the balance between sustainability values and storage capacity

    Sci. Total Environ.

    (2018)
  • E. Toth et al.

    Comparison of short-term rainfall prediction models for real-time flood forecasting

    J. Hydrol.

    (2000)
  • M.Q. Villafuerte

    Long-term trends and variability of rainfall extremes in the Philippines

    Atmos. Res.

    (2014)
  • S.M. Abubakar

    Pakistan 7th most Vulnerable Country to Climate Change

    (2017)
  • M. Ahasan et al.

    Simulation of a flood producing rainfall event of 29 July 2010 over north-west Pakistan using WRF-ARW model

    Nat. Hazards

    (2013)
  • H. Akaike

    A new look at the statistical model identification

    IEEE Trans. Autom. Control

    (1974)
  • I.U.H. Akhtar

    Pakistan Needs a New Crop Forecasting System

    (2014)
  • J.I. Barredo

    Major flood disasters in Europe: 1950–2005

    Nat. Hazards

    (2007)
  • H.N. Bhalme et al.

    Large-scale droughts/floods and monsoon circulation

    Mon. Weather Rev.

    (1980)
  • L. Breiman

    Bagging predictors

    Mach. Learn.

    (1996)
  • L. Breiman

    Random forests

    Mach. Learn.

    (2001)
  • J. Briggs et al.

    Turbulent Mirror: An Illustrated Guide to Chaos Theory and the Science of Wholeness

    (1989)
  • X. Cai et al.

    Bat algorithm with Gaussian walk

    Int. J. Bio-Inspired Computation.

    (2014)
  • G.-c. Chen et al.

    Particle swarm optimization algorithm

    Information and Control-Shenyang

    (2005)
  • D.G. Clayton

    A model for association in bivariate life tables and its application in epidemiological studies of familial tendency in chronic disease incidence

    Biometrika

    (1978)
  • Q. Dai et al.

    Modelling radar-rainfall estimation uncertainties using elliptical and Archimedean copulas with different marginal distributions

    Hydrol. Sci. J.

    (2014)
  • M.P. Darji et al.

    Rainfall forecasting using neural network: a survey

  • L. Davis

    Handbook of Genetic Algorithms

    (1991)
  • De Lathauwer, L., De Moor, B., Vandewalle, J., by Higher-Order, B.S.S, 1994. Singular value decomposition. Proc....
  • R.C. Deo et al.

    A wavelet-coupled support vector machine model for forecasting global incident solar radiation using limited meteorological dataset

    Appl. Energy

    (2017)
  • Department, P.M

    Dry Weather Predicted in the Country during Friday/Monday

    (2010)
  • T.G. Dietterich

    Ensemble learning

  • N.R. Draper et al.

    Applied Regression Analysis

    (2014)
  • Cited by (69)

    • Joint probability of drought encounter among three major grain production zones of China under nonstationary climate

      2021, Journal of Hydrology
      Citation Excerpt :

      This is computationally convenient, but it may lead to weak linkage effects for three-dimensional variables. In addition to the three copulas used in this study, there are other types of copula functions can be used to construct multidimensional distribution models, for example, Ali et al. (2018) utilized 25 different types of copulas to improve the performance of probabilistic machine learning model. In the future work, we will consider more copula types, such as t-copula, Fischer-Hinzmann copula (Ali et al., 2018), asymmetric copula functions (Ayantobo et al., 2019), and vine copula (Aas et al., 2009; Liu et al., 2015; Ni et al., 2020), to improve the performance of multivariate dependencies and reduce the uncertainty of joint probability assessment.

    View all citing articles on Scopus
    View full text