MTECH PROJECTS
Cost-Efficient High-Performance Internet-Scale Data Analytics over Multi-cloud Environments To analyze data distributed across the world, one can use distributed computing power to take advantage of data locality and achieve higher throughput. The multi-cloud model, a composition of multiple clouds, can provide cost-effective computing resources to process such distributed data. As multicolour becomes more and more accessible from cloud users, the use of MapReduce/Hadoop over multi-cloud is emerging, however, existing work has two issues in principle. First, it mainly focuses on maximizing throughput by improving data locality, but the perspective of cost optimization is missing. Second, conventional centralized optimization methods would not be able to scale well in multi-cloudenvironments due to its highly dynamic nature. We plan to solve the first issue by formalizing an optimization framework for MapReduce over multi-cloud including virtual machine and data transfer costs, and then the second issue by creating decentralized resource management middleware that considers multi-criteria (cost and performance) optimization. This paper reports progress we have made so far on these two directions.