RESEARCH PAPER: Software Defined Availability (SDA)- Critical for Managing Datacenter Scale

Many people think about web and cloud services such as Amazon Web Services (AWS) as “always available”. However, these services have poor availability compared to the high availability (HA) and fault tolerant (FT) IT services that are deployed for processes that must not fail. An example of a process that must not fail—handled by FT systems—is that part of the credit card transaction flow where one bank has subtracted funds but the other bank hasn’t added them yet. Processes like these have tended to run on expensive, explicitly HA hardware. Now, the massively-replicated hardware infrastructure underlying hyperscale services has the potential to lower the cost of HA solutions. Lower costs can be achieved by shifting the focus away from expensive, explicitly HA hardware toward mainstream commercial hardware with software-based availability.

Table of Contents

  • Executive Summary
  • Moving from Uptime to Downtime and Availability
  • Hardware Deployed at Scale Fails at Scale
  • “Software Defined” is an Abstract Concept
  • Software Defined Availability: How it Works
  • Conclusion
  • Table 1: Availability vs. Unavailability
You can download the paper here.

Companies Referenced:

  • Amazon
  • Baidu
  • Facebook
  • Google
  • Microsoft
  • Netflix
  • Stratus
  • Tencent
  • VMware