Data Is Escaping Your Data Center And What You Can Do About It

While I do have the flair for the dramatic, I don’t mean to alarm anyone by saying this, but your data is escaping from the datacenter. The volume and velocity of data are increasing at such a rapid rate, and being generated and consumed increasingly away from the traditional IT datacenter, that there is no other conclusion to draw.  Data is no longer just in the data center.  It is a fundamental tenet of data processing that a business-critical application requires its data to be stored close to the compute, memory and servers hosting that applications. That tenet is evolving.  As workloads migrate to the cloud, and other workloads migrate the other direction towards the edge, there are increasingly times where data needs to be both stored and processed far away from your data center. I wrote about this recently when talking about why data matters in 2018.

This new way of thinking about workload placement naturally requires a fresh examination of data center storage architecture.  It’s no longer as simple as connecting storage and compute in neighboring racks to a fibre-channel or 10G-bit Ethernet network and hoping that data is available when it is needed.  There is a transformation occurring in how an organization’s information architecture influences its data center architecture.  At the same time, the responsibilities and needs of an IT staff are changing. In place of teams of specialists focused on a given IT element like storage or virtualization, IT professionals are broadening into roles that benefit from a deeper understanding of data being processed.  IT teams of tomorrow need the skills to map the needs to applications to the underlying storage, compute, and networking that serves that data. Data that is both inside and outside the data-center.

Below, I’ll give you two examples that demonstrate how data is escaping the data center today, and how those changes are impacting how IT organizations need to think about data.

Cloud and hybrid cloud

Cloud has become a reality for enterprise computing. Public cloud is pervasive, but it has not become the miracle solution envisioned at the top of the hype-cycle.  Rather than migrate applications wholesale from datacenters to the public cloud, organizations are choosing to place their applications where it makes the most sense from both technical, performance and business considerations.  This hybrid cloud approach has caused workloads to become more portable, with business-critical applications migrating between cloud and corporate data centers.

Migrating processing workloads between servers, whether public cloud or local, is not a complicated problem. AWS cloud services offer migration abilities leveraging a variety of partners, including Microsoft Corporation, Intel Corporation, VMware, and SAP.   Microsoft Azure cloud services have hybrid-cloud migration abilities tools integrated into their offering.

Moving an application from the datacenter to the cloud could force migration of terabytes of data that support that application.  Migrating data from a traditional storage array to more nebulous cloud options requires a clear understanding of the issues involved.  Even over the fastest broadband links available today, terabytes of data can take an unacceptably significant amount of time to move. To manage this complexity, AWS offers a service named “snowmobile” where AWS will bring a  truck full of disk-drives to your datacenter. The data is copied from your on-site storage arrays to AWS arrays living in the truck’s trailer, which they then drive to the cloud region of your choice and copy it to their cloud.

Not as dramatic as a tractor-trailer full of disks drives, there is also a range of hybrid-cloud storage solutions from every tier-1 storage supplier that match the needs of many applications. Whether you are building your infrastructure with technology from Pure Storage Inc, NetApp, Hewlett Packard Enterprise, or Dell EMC, there is a solution available.
I’m not going to solve your hybrid-cloud problem in this article. Instead, I’m pointing out that applications and that data that serve those applications have very different considerations that need to be addressed when an IT team decides to balance workloads between cloud and an on-premise datacenter.  Data can be large and expensive to move around.  You don’t want to make mistakes. Understanding the information needs of your organization’s applications becomes a critical skill for every IT architect and specialist alike.
Edge, fog, and the Internet of Things

The Internet-of-Things (IoT) is about turning millions of points of data, collected from intelligent devices located well outside a corporate datacenter, into actionable insights.  That is a broad definition, but IoT is a broad topic.

As an example, those millions of points of data might be part of a traffic control system.  Data arriving from a combination of intelligent traffic control cameras and road sensors feed into computers that make decisions about optimal traffic signal patterns.  IoT data could also be coming from the many dozens of pieces of equipment on an oil rig, with on-site actionable insights generated about the efficiency of the platform.

There are hundreds of examples of the Internet-of-Things, but there are also many common attributes. One of those attributes is that data is collected, and in many instances, consumed, well outside the data center.  The traffic control system, for example, may not be able to tolerate the latencies involved in processing data and still be able to make the lights change on time. It also needs to survive a loss of communication with a remote data center, forcing processing to occur locally.

Computing near the data is called “edge computing.”  The leading industry group working on defining standards for computing at the edge is the OpenFog Consortium.  This consortium was formed by Intel, Cisco Systems, Dell EMC, and others. Today, the OpenFog Consortium has nearly 70 participating companies all driving towards a standard architecture for computing near the edge.

A vital component of the OpenFog architecture is the inclusion of persistent storage for sensor and other data that is to be processed at the edge. You will see architectures where persistent memory and SSDs are deployed to solve this problem.  Fast decision making at the edge will see architectures that blend traditional 3D NAND based SSDs with faster Intel Optane devices for very low-latency and high-bandwidth processing.

Whether the edge computing model coalesces around the reference architecture being proposed by the industry as part of the OpenFog Consortium or pursues some other path, the reality doesn’t change.  Data is being generated well outside the data center, at the edge, and is being both stored and processed there.  Data at the edge raises a host of questions about data retention, durability, and fault resilience and availability. IT organizations know how to address these issues, making it imperative that IT participate in incorporating these new models into their data, and related storage, strategies.
IT has enterprise-savvy partners who they can engage with today to have that conversation.  I’ve written recently about both Intel Corporation with Xeon D and Dell Technologies IQT efforts around these topics.  Hewlett Packard Enterprise is another major IT vendor who understands the confluence of enterprise and the internet of things.   Engage these partners, and ensure that you have data strategy that is inclusive to edge computing, and that makes sense.

Wrapping up

Data is escaping the data center, whether it’s going to the edge or the cloud.  An organization’s IT teams understand data better than any other organization in the enterprise and must evolve the with the changing models.  I’ve written before about defining a data strategy and engaging partners such as Intel, Hewlett Packard Enterprise, and Dell EMC to help you flesh it out.  All of these vendors bridge storage, compute, and edge computing.

My advice is to enterprises: engage these partners, have the conversation, and make sure that your corporate data strategyincludes data that has escaped the data center.  The integrity of your data depends on it.

Note: This blog contains substantial contributions by Steven McDowell, Moor Insights & Strategy senior analyst of storage and storage technologies.