As companies adapt to external change, transition to new ways of working and wrestle with data growth, effective data management becomes a critical factor in digital transformation. By effective data management, I mean all the stages of managing data as a valuable asset, starting with data collection and extending to processing, governance, sharing and analysis.
Enabling the consistent and reliable flow of data across people, teams and business functions is the lifeblood of an organization and the key to competitiveness and the ability to innovate. To get the most accurate barometer on how companies are addressing data challenges, I turned to someone who lives on the customer front lines—Otho Lyon, vice president of global support, Cloudera.
Adaptability — crucial for cloud deployments
Most Cloudera customers live in a hybrid world. I believe hybrid and multi-cloud models will continue to prosper as companies take advantage of the strengths and capabilities of different cloud providers.
Keep in mind that the cloud does not reduce costs. I haven’t talked to a single enterprise that said, “Cloud is reducing my costs.” Instead, customers see the cloud as the best platform for data modernization and analytics due to its scalability, flexibility and advanced tools and resources. The cloud is everywhere; in the data center, the public cloud, the sovereign cloud, and even in the company you just acquired.
Lyon emphasizes that successful data initiatives will drive value across many use cases, including revenue management, customer engagement, cross-selling and personalization, marketing optimization, risk management and decision-making. One of the main goals for data managers is to support these data initiatives by enabling access to analytics across hybrid and multi-cloud models.
Yet therein lies the challenge—knowing where the data resides in a multi-cloud environment.
Lyon outlined a typical environment he sees with customers today. There will likely be data on-premises for various reasons, such as security, governance and regulatory requirements. Perhaps customer data resides in AWS S3 storage accessed with Amazon Athena or Redshift. Google Analytics is used to measure advertising ROI via Google BigQuery. And then finally, let’s say the new company recently acquired store data stored in Microsoft Azure and accessed with Azure Synapse.
This example demonstrates the challenges presented to data leaders: keeping data in sync and ensuring governance and security to manage and orchestrate across all these platforms and providers while maintaining a good audit history.
A data fabric is required to maintain consistency
Lyons mentioned that the threat of a data breach is top of mind for data leaders. Data leaders must be confident that data is secured and appropriately governed regardless of location. Enterprises are under tremendous regulatory scrutiny with new data, localization, and data sovereignty rules worldwide. To operate confidently in this environment, enterprises seek ways to perform consistently.
Cloudera provides a data fabric or modern architecture to overcome these challenges. Cloudera Data Platform (CDP) supports hybrid, multi-cloud data management and is open, portable and secure to orchestrate and manage data across multi-cloud and on-premises environments.
Data is the new oil
Most successful companies recognize data as a strategic resource. In particular, data-driven decision-making models are essential for companies to become more efficient and optimized.
Lyon noted that the most significant issue he is helping customers with is retrieving data stuck in silos across the organization. The data siloed across various on-premises application systems, databases, data warehouses and SaaS applications makes it difficult to support new use cases for analytics or machine learning.
Data leaders are now looking to centralize all data within a “lakehouse” architecture (combining the best aspects of data lakes and data warehouses) to enable new use cases and manage the growth and complexity of data. However, the tricky part is to move data from various systems into the lakehouse efficiently.
Cloudera DataFlow is a data ingestion solution built for hybrid data, available as a cloud-native Apache NiFi service in CDP. DataFlow covers many use cases, such as batch, event-driven, edge, microservices and streaming.
DataFlow eliminates ingestion silos by connecting to any data source with any structure, processing it and delivering anywhere using low-code development. 450+ connectors and processors support DataFlow across the ecosystem of hybrid cloud services—including data lakes, lakehouses, cloud warehouses and on-premises sources.
Uncertainties and change in the labor market
Anyone following the business news knows that there are many uncertainties in the labor market and the broader economy. It seems like every day has an announcement of a giant layoff, particularly in technology. The labor market pendulum can swing from one extreme to the other even in the best of times; today, that tendency is becoming more complicated as it combines with new norms related to remote work—and all of this is placing greater importance on data management. Data scientists and leaders remain in high demand even within companies and industries that are reducing the labor force.
Technological innovation in data management will continue to be critical because even though there are plenty of job openings for data scientists and data leaders, there are not nearly enough qualified people to fill them. Hence, artificial intelligence (AI) and machine learning (ML) is becoming increasingly important. Also, because there will be more remote workers than in the past, data leaders must adjust data processes and workflows.
Modern data architectures such as data lakehouses and data fabrics are essential to driving business efficiencies across a hybrid and multi-cloud model. In the future, the separation of the winners from the laggards will hinge on the speed and efficiency of predictive analytics to support faster and better decisions—all of which inevitably relies on rock-solid data management.
Everyone should consider moving to a hybrid data platform to manage the entire life cycle of data analytics and machine learning. And the platform must be open and interoperable to ease sharing and enable self-serve functionality, as with the Cloudera Data Platform (CDP). Enterprises require a comprehensive data management strategy that includes technology, best practices and help from folks like Otho Lyons!