MTECH PROJECTS
Joint Scheduling of Data and Computation in Geo-Distributed Cloud Systems Recent trends show that cloud computing is growing to span more and more globally distributed data centers. For geo-distributed data centers, there is an increasing need for scheduling algorithms to place tasks across data centers, by jointly considering data and computation. This scheduling must deal with situations such as wide-area distributed data, data sharing, WAN bandwidth costs and data center capacity limits, while also minimizing completion time. However, this kind of scheduling problems is known to be NP-Hard. In this paper, inspired by real applications in astronomy field, we propose a two-phase scheduling algorithm that addresses these challenges. The mapping phase groups tasks considering the data-sharing relations, and dispatches groups to data centers by way of one-to-one correspondence. The reassigning phase balances the completion time across data centers according to relations between tasks and groups. We utilize the real China-Astronomy-Cloud model and typical applications to evaluate our proposal. Simulations show that our algorithm obtains up to 22% better completion time and effectively reduces the amount of data transfers compared with other similar scheduling algorithms.