Failure in cloud services is inevitable. Finding the root cause of failure can be very complex or at times nearly impossible. Existing solutions try to provide higher and higher availability in the cloud. However, one should not forget that different cloud users have varying availability demands as well as a diverse willingness to pay for uptime.
We proposed the Availability Knob (AK) to provide flexible, user-defined, availability in public clouds, allowing the user to express their desire for availability to the cloud provider. Complementary to existing high-reliability solutions and not requiring hardware changes, AK enables more efficient markets. This leads to reduced provider costs, increased provider profit, and improved user satisfaction when compared to a public cloud with no ability to convey availability needs.
We leveraged game theory to derive incentive compatible pricing, which not only enables AK to function with no knowledge of the root cause of failure but also function under adversarial situations where users deliberately cause downtime. We developed a high-level stochastic simulator to test AK in a large-scale public cloud setup over long time periods. We also prototyped AK in OpenStack to explore availability-API tradeoffs and to provide a grounded, real-world, implementation.
Our results show that deploying AK leads to more than 10% cost reduction for providers and improves user satisfaction. It also enables providers to set variable profit margins based on the risk of not meeting availability guarantees and the disparity in availability supply/demand. Variable profit margins enable cloud providers to improve their profit by as much as 20%.
SoCC '16 Proceedings of the Seventh ACM Symposium on Cloud Computing, 2016