Failure is inevitable in cloud environments. Finding the root cause of a failure can be very complex or at times nearly impossible. Different cloud customers have varying availability demands as well as a diverse willingness to pay for availability. In contrast to existing solutions that try to provide higher and higher availability in the cloud, we propose the Availability Knob (AK). AK provides flexible, user-defined, availability in public clouds, allowing the customer to express their desire for availability to the cloud provider.
Complementary to existing high-reliability solutions and not requiring hardware changes, AK enables more efficient markets. This leads to reduced provider costs, increased provider profit, and improved user satisfaction when compared to a public cloud with no ability to convey availability needs. We leverage game theory to derive incentive compatible pricing, which not only enables AK to function with no knowledge of the root cause of failure but also function under adversarial situations where users deliberately cause downtime. We develop a high-level stochastic simulator to test AK in a large-scale public cloud setup over long time periods. We also prototype AK in OpenStack to explore availability-API tradeoffs and to provide a grounded, real-world, implementation.
Our results show that deploying AK leads to more than 10% cost reduction for providers and improves user satisfaction. It also enables providers to set variable profit margins based on the risk of not meeting availability guarantees and the disparity in availability supply/demand. Variable profit margins enable cloud providers to improve their profit by as much as 20%.