Mechanisms and Architectures for Tail-Tolerant System Operations in Cloud

Select |




Print


Lu, Qinghua; Zhu, Liming; Xu, Xiwei (Sherry); Li, Shanshan; Bass, Len; Zhang, Weishan

Lu, Qinghua; Zhu, Liming; Xu, Xiwei (Sherry); Li, Shanshan; Bass, Len; Zhang, Weishan


2014-06-17


Conference Material


6th USENIX Workshop on Hot Topics in Cloud Computing


Philadaphia, US


Conducting system operations (such as upgrade, reconfiguration, deployment) for large-scale systems in cloud is error prone and complex. These operations (performed on hundreds to thousands of nodes) heavily rely on still-unreliable Cloud infrastructure APIs to complete. The inherent uncertainties and inevitable errors cause long-tail in the completion time of operations. In this paper, we tolerant the long-tail through proposing a new set of mechanisms at Cloud provisioning API level and deployment tactics at architecture level for system operations. We implement our mechanisms as a tail-tolerant wrapper around Amazon cloud APIs. Our initial evaluation shows that the mechanisms and deployment tactics can effectively reduce the long tail.


API, Reliability, Fault-Tolerant Design, Deployment Architecture, Cloud Computing


https://www.usenix.org/conference/hotcloud14


nicta:7924


Lu, Qinghua; Zhu, Liming; Xu, Xiwei (Sherry); Li, Shanshan; Bass, Len; Zhang, Weishan. Mechanisms and Architectures for Tail-Tolerant System Operations in Cloud. In: 6th USENIX Workshop on Hot Topics in Cloud Computing; Philadaphia, US. 2014-06-17.



Loading citation data...

Citation counts
(Requires subscription to view)