Skip to main content
Log in

Rethinking elastic online scheduling of big data streaming applications over high-velocity continuous data streams

  • Published:
The Journal of Supercomputing Aims and scope Submit manuscript

Abstract

Online scheduling plays a key role for big data streaming applications in a big data stream computing environment, as the arrival rate of high-velocity continuous data stream might fluctuate over time. In this paper, an elastic online scheduling framework for big data streaming applications (E-Stream) is proposed, exhibiting the following features. (1) Profile mathematical relationships between system response time, multiple application fairness, and online features of high-velocity continuous stream. (2) Scale out or scale in a data stream graph by quantifying computation and communication cost, and the vertex semantics for arrival rate of data stream, and adjust the degree of parallelism of vertices in the graph. Subgraph is further constructed to minimize data dependencies among the subgraphs. (3) Elastically schedule a graph by a priority-based earliest finish time first online scheduling strategy, and schedule multiple graphs by a max–min fairness strategy. (4) Evaluate the low system response time and acceptable applications fairness objectives in a real-world big data stream computing environment. Experimental results conclusively demonstrate that the proposed E-Stream provides better system response time and applications fairness compared to the existing Storm framework.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10

Similar content being viewed by others

References

  1. Eskandari L, Huang Z, Eyers D (2016) P-Scheduler: adaptive hierarchical scheduling in apache storm. In: Proceedings of the Australasian Computer Science Week Multiconference, ACSW 2016, No. 26. ACM Press, New York

  2. Sun DW, Zhang GY, Wu CW, Li KQ, Zheng WM (2017) Building a fault tolerant framework with deadline guarantee in big data stream computing environments. J Comput Syst Sci 89:4–23

    Article  MathSciNet  MATH  Google Scholar 

  3. Dayarathna M, Toyotaro S (2013) Automatic optimization of stream programs via source program operator graph transformations. Distrib Parallel Databases 31(4):543–599

    Article  Google Scholar 

  4. Alexandrov A, Salzmann A, Krastev G, Katsifodimos A, Markl V (2016) Emma in Action: declarative dataflows for scalable data analysis. In: Proceedings of the 2016 International Conference on Management of Data, SIGMOD 2016. ACM Press, New York, pp 2073–2076

  5. Convolbo MW, Chou J (2016) Cost-aware DAG scheduling algorithms for minimizing execution cost on cloud resources. J Supercomput 72(3):985–1012

    Article  Google Scholar 

  6. Kanoun K, Tekin C, Atienza D, Shaar M (2016) Big-data streaming applications scheduling based on staged multi-armed bandits. IEEE Trans Comput 65(12):3591–3605

    MathSciNet  MATH  Google Scholar 

  7. Fu TZJ, Ding J, Ma RTB, Winslett M, Yang Y, Yin Z, Zhang Z (2015) DRS: dynamic resource scheduling for real-time analytics over fast streams. In: Proceedings of 2015 IEEE 35th International Conference on Distributed Computing Systems, ICDCS 2015. IEEE Press, New York, pp 411–420

  8. Peng B, Hosseini M, Hong Z, Farivar R, Campbell R (2015) R-Storm: resource-aware scheduling in Storm. In: Proceedings of the 16th Annual Middleware Conference, Middleware 2015. ACM Press, New York, pp 149–161

  9. Choi Y, Chang S, Kim Y, Lee H, Son W, Jin S (2016) Detecting and monitoring game bots based on large-scale user-behavior log data analysis in multiplayer online games. J Supercomput 72(9):3572–3587

    Article  Google Scholar 

  10. Lohrmann B, Janacik P, Kao O (2015) Elastic stream processing with latency guarantees. In: Proceedings of 2015 IEEE 35th International Conference on Distributed Computing Systems, ICDCS 2015. IEEE Press, New York, pp 399–410

  11. Ahmad SG, Liew CS, Rafique MM, Munir EU, Khan SU (2014) Data-intensive workflow optimization based on application task graph partitioning in heterogeneous computing systems. In: Proceedings of 4th IEEE International Conference on Big Data and Cloud Computing, BDCloud 2014. IEEE Press, New York, pp 129–136

  12. Ghafarian T, Javadi B (2015) Cloud-aware data intensive workflow scheduling on volunteer computing systems. Future Gener Comput Syst 51:87–97

    Article  Google Scholar 

  13. Gu Y, Wu CQ (2016) Performance analysis and optimization of distributed workflows in heterogeneous network environments. IEEE Trans Comput 65(4):1266–1282

    Article  MathSciNet  MATH  Google Scholar 

  14. Chen TW, Lee YC, Fekete A, Zomay AY (2015) Adaptive multiple-workflow scheduling with task rearrangement. J Supercomput 71(4):1297–1317

    Article  Google Scholar 

  15. Arabnejad H, Barbosa JG (2014) A budget constrained scheduling algorithm for workflow applications. J Grid Comput 12(4):665–679

    Article  Google Scholar 

  16. Yun D, Wu CQ, Gu Y (2015) An integrated approach to workflow mapping and task scheduling for delay minimization in distributed environments. J Parallel Distrib Comput 84:51–64

    Article  Google Scholar 

  17. Xu J, Chen Z, Tang J, Su S (2014) T-Storm: traffic-aware online scheduling in Storm. In: Proceedings of 2014 IEEE 34th Internatoin Conference on Distributed Computing Systems, ICDCS 2014. IEEE Press, New York, pp 535–544

  18. Aniello L, Baldoni R, Querzoni L (2013) Adaptive online scheduling in Storm. In: Proceedings of the 7th ACM International Conference on Distributed Event-Based Systems, DEBS 2013. ACM Press, New York, pp 207–218

  19. Katsipoulakis NR, Thoma C, Gratta EA, Labrinidis A, Lee AJ, Chrysanthis PK (2015) CE-Storm: confidential elastic processing of data streams. In: Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data, SIGMOD 2015. ACM Press, New York, pp 859–864

  20. Chen Z, Xu J, Tang J, Kwiat K, Kamhoua C (2015) G-Storm: GPU-enabled high-throughput online data processing in Storm. In: Proceedings of the 2015 IEEE International Conference on Big Data, Big Data 2015. IEEE Press, New York, pp 307–312

  21. Basanta-Val P, Fernández-García N, Wellings AJ, Audsley NC (2015) Improving the predictability of distributed stream processors. Future Gener Comput Syst 52:22–36

    Article  Google Scholar 

  22. Verma A, Kaushal S (2015) Cost-time efficient scheduling plan for execution workflows in the cloud. J Grid Comput 13(4):495–506

    Article  MathSciNet  Google Scholar 

  23. Gu L, Zeng D, Guo S, Xiang Y, Hu J (2016) A general communication cost optimization framework for big data stream processing in geo-distributed data centers. IEEE Trans Comput 65(1):19–29

    Article  MathSciNet  MATH  Google Scholar 

  24. Tang S, Lee BS, He B (2017) Fair resource allocation for data-intensive computing in the cloud. IEEE Trans Serv Comput. doi:10.1109/TSC.2016.2531698

    Google Scholar 

  25. Sun DW, Huang R (2016) A stable online scheduling strategy for real-time stream computing over fluctuating big data streams. IEEE Access 4:8593–8607

    Article  Google Scholar 

  26. Hu M, Luo J, Wang Y, Lukasiewycz M, Zeng Z (2014) Holistic scheduling of real-time applications in time-triggered in-vehicle networks. IEEE Trans Ind Inf 10(3):1817–1828

    Article  Google Scholar 

  27. Alkhanak EN, Lee SP, Rezaei R, Parizi RM (2016) Cost optimization approaches for scientific workflow scheduling in cloud and grid computing: a review, classifications, and open issues. J Syst Softw 113:1–26

    Article  Google Scholar 

  28. Hu M, Luo J, Wang Y, Veeravalli B (2017) Adaptive scheduling of task graphs with dynamic resilience. IEEE Trans Comput 66(1):17–23

    Article  MathSciNet  MATH  Google Scholar 

  29. Matei Z, Dhruba B, Joydeep SS, Khaled E, Scott S, Ion S (2010) Delay scheduling: a simple technique for achieving locality and fairness in cluster scheduling. In: Proceedings of 5th European Conference on Computer systems, EuroSys 2010. ACM Press, New York, pp 265–278

  30. Bala A, Chana I (2015) Intelligent failure prediction models for scientific workflows. Expert Syst Appl 42(3):980–989

    Article  Google Scholar 

  31. Zeng L, Veeravalli B, Zomaya AY (2015) An integrated task computation and data management scheduling strategy for workflow applications in cloud environments. J Netw Comput Appl 50:39–48

    Article  Google Scholar 

  32. Shi J, Luo J, Dong F, Zhang J, Zhang J (2016) Elastic resource provisioning for scientific workflow scheduling in cloud under budget and deadline constraints. Clust Comput 19(1):167–182

    Article  Google Scholar 

  33. Zhu Z, Zhang G, Li M, Liu X (2016) Evolutionary multi-objective workflow scheduling in cloud. IEEE Trans Parallel Distrib Syst 27(5):1344–1357

    Article  Google Scholar 

  34. Toshniwal A, Taneja S, Shukla A, Ramasamy K, Patel JM, Kulkarni S, Jackson J, Gade K, Fu M, Donham J, Bhagat N, Mittal S, Ryaboy D (2014) Storm@twitter. In: Proceedings of 2014 ACM SIGMOD International Conference on Management of Data, SIGMOD 2014. ACM Press, New York, pp 147–156

Download references

Acknowledgements

This work is supported by the National Natural Science Foundation of China under Grant No. 61602428; the Fundamental Research Funds for the Central Universities under Grant No. 2652015338; and Melbourne-Chindia Cloud Computing (MC3) Research Network. We are grateful to Prof. Satish Srirama for his comments on improving the paper.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Dawei Sun.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Sun, D., Yan, H., Gao, S. et al. Rethinking elastic online scheduling of big data streaming applications over high-velocity continuous data streams. J Supercomput 74, 615–636 (2018). https://doi.org/10.1007/s11227-017-2151-2

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11227-017-2151-2

Keywords

Navigation