ABSTRACT

Online Algorithms for Timely Migration of Big Data into the Best Cloud Data Center 81 Problem Formulation 81

Decision Variables 81 Costs 82 e O ine Optimization Problem 83

Two Online Algorithms 84 e Online Lazy Migration (OLM) Algorithm 84 e Randomized Fixed Horizon Control (RFHC) Algorithm 87

Online Algorithms for Uploading Deferral Big Data to the Cloud 90 Problem Formation 91 e Single ISP Case 92

e Primal and Dual Cost Minimization LPs 92 Online Algorithms 93

e Cloud Scenario 96 Conclusion 99 References 100

Cloud computing provides agile and scalable resource access in a utility-like fash-ion, especially for the processing of Big Data. An important open problem here is to eciently move data from dierent geographical locations to a cloud for processing. is chapter examines two representative scenarios in this picture, and introduces online algorithms to achieve timely, cost-minimizing upload of Big Data into the cloud. First, we focus on uploading massive, dynamically generated, geo-dispersed data into a cloud encompassing disparate data centers, for processing using a centralized MapReduce-like framework. A cost-minimizing data migration problem is formulated, and two online algorithms are given: an online lazy migration (OLM) algorithm and a randomized xed horizon control (RFHC) algorithm, for optimizing at any given time the choice of the data center for data aggregation and processing, as well as the routes for transmitting data there. Second, we discuss how to minimize the bandwidth cost for uploading deferral Big Data to a cloud, for processing by a (possibly distributed) MapReduce framework, assuming that the Internet Service Provider (ISP) adopts the MAX contract pricing scheme. We rst analyze the single ISP case and then generalize to the MapReduce framework over a cloud platform. In the former, we review a Heuristic Smoothing algorithm whose worst-case competitive ratio is proved to fall between 2 − 1/(D + 1) and 2(1 − 1/e), where D is the maximum tolerable delay. In the latter, we employ the Heuristic Smoothing algorithm as a building block, and demonstrate an ecient distributed randomized online algorithm, achieving a constant expected competitive ratio.