For years, when it came to financial transactions and other mission-critical data transfers, the idea of delivering high availability meant an exercise in hardware redundancy and came from companies like IBM and Tandem, now part of Hewlett-Packard. It was expensive, because this dates back to a time when companies did not buy multiple computers for a single application. Assured availability required proprietary hardware and software to operate two systems so that when one stopped working the application would “fail over” to the other. When all you have is a hammer – hardware redundancy – every problem looks like a nail. But in today’s world of scale out systems, cloud services and the ability to swipe a credit card with companies like Amazon.com AWS to get inexpensive web services at scale, there are more options to consider for configuring availability at the platform level. However, not all of the architectural options can guarantee availability of an application on these larger cloud systems. So does a focus on uptime need to happen through expensive hardware or less expensive software-based services?
Companies like Stratus Technologies, who pioneered the early days of always-on computing as “Stratus Computer”, are starting to deliver always-on solutions for redundancy through software running on off-the-shelf cloud hardware. It’s supports the concept called Software Defined Availability (paper here),which fits nicely into the construct of the software-defined datacenter.
Years ago, redundancy was enforced for higher end financial systems and transactional processing like ATM cash disbursement through massive systems that cost millions to acquire and even more to keep in operation through expensive services. The financial model gravitated around the idea that the highest quality hardware, with built-in redundant components was the best financial hedge on downtime. Expensive data communications links coupled with a small number of expensive redundantly configured systems capable of delivering the right number of “9’s” for up-time were the only options available..
Times have changed. Merchant clouds like Amazon.com AWS are a massive set of redundant hardware resources designed to react to flexible businesses needs. When opportunity knocks, businesses grab that brass ring; they can’t afford to mortgage the future being tied down to expensive hardware when the opportunity may be highly profitable but also fleeting. While many watched the Super Bowl this past weekend, the play on the field quickly paled in comparison to the annual festival of high end ads. And with a blowout on the field, distracted viewers might engage with those advertisers at a higher than expected level. These types of opportunities can be massive – and short-lived, so high capacity with low system downtime is essential.
More and more our society is focusing on the fast impact –from business decisions to social media. Clearly capitalizing on the short term is in the wheelhouse of cloud-based service providers. And when it comes to capitalizing on changes in the business environment, fault tolerance is critical as businesses are expecting to monetize these changing dynamics. As the market shifts, it is important to make sure that the business is “always on”, ready to capture that demand. Fast-moving software-based solutions might help businesses capitalize on opportunity, but without availability, it may become an exercise in futility. This is specifically why the software-focused infrastructure needs greater reliability.
If you buy into why a datacenter might need a lower-cost, higher availability solution, how does IT go about actually implementing this?
Clearly there is the more manual process where the IT department takes the ownership to design the architecture that achieves the right level of availability with a minimal cost (if it is even possible for them to drive the right metrics on those two vectors.) Or you they can step outside of the data center and move to a cloud-based service like Amazon.com AWS, manually stitching together all of the compute, storage and networking with the right provisional models to ensure the level of availability that you need. But this may prove to be just as problematic as trying the same exercise within your data center.
But a more realistic solution might be a company like Stratus who focuses their effort on building highly available systems that work at the software level. By leveraging the open source OpenStack platform, Stratus’s solutions are far more flexible than many of the high availability solutions of the past that focused mainly on the hardware or a proprietary operating system. The two dimensions that SDA focuses on are the control plane (which directs the actions and helps enforce the redundancy) along with the operations plane (which is doing all of the heavy lifting by processing the actual transactions.)
With an agile control plane like Stratus can deliver, whether it is in the data center or through their cloud services, applications are “always on”, regardless of the underlying hardware status. This allows IT to leverage far more cost-effective industry-standard server platforms that could help lower both the acquisition and operational costs, and frees up capital for that next business opportunity down the line.