Another year flew by, and we had the second Google Data Cloud Summit. The multi-week event is a comprehensive overview of the “state of Google Cloud.” In this article, I will describe some the most important announcements from this annual event and add some opinion.
A single unified data platform
Before diving into Google’s new announcements, below is a primer on Google’s unified data platform.
Google’s Data Cloud is a single platform covering the end-to-end value chain of data services, from data origination to insight. Google intentionally decouples the data plane from the workload and the people who are interacting with it so that there is always high productivity from data engineering to business analytics. If you’re looking for a one-stop shop, this is a set of services you have to consider. Data, analytics and ML are Google Cloud’s strongest suits and large enterprises would also like consistent security, governance, lineage, management and automation.
Spanner, a SQL database service, is processing more than two billion queries per second at peak. Bigtable, a NoSQL database service, processes over five billion requests per second at peak. BigQuery is a serverless, scalable multi-cloud data warehouse. Google now has hundreds of customers that are going beyond petabytes of data. BigQuery customers analyze more than 110 terabytes of data per second, an incredible number. Vertex AI is Google’s machine learning platform, designed for high productivity, which Google claims requires 80% less code to train a model compared to other machine learning platforms.
The single platform allows these services to be combined. BigQuery and Vertex AI in a single data plane can increase productivity because less data engineering work is needed. Google says customers have seen a two-and-a-half times increase in deploy machine learning models since Google brought these technologies together. Looker is a business intelligence, embedded analytics, and data application platform that reduces reliance on technical teams with a robust semantic layer. Finally, there is Firestore to develop applications with a developer community of more than 250,000 active developers.
Google’s Data Cloud has an impressive set of unified services, the tech is impressive, but knowing how customers utilize those services makes it more real for me.
UPS, the shipping giant, has transformed logistics with intelligent forecasting. UPS has an annual savings of 400 million dollars from reducing fuel costs because of improved route planning by analyzing more than one billion data points every day. Carrefour, the French retailer, has reimagined the customer experience based on a data lake holding 700 terabytes of data, which handles 100 million API calls a month. Walmart has modernized business operations, increasing process efficiencies to close the books in less than three days. HSBC in the banking sector has created a new experience for customers while maintaining the highest security and compliance requirements.
Like I said, Google is strong in “data”.
Google’s BigLake “data lakehouse”
A data lake consists of raw data; the purpose is not yet defined. A data warehouse contains structured, filtered data already processed for a specific purpose.
We all know that data are growing at an incredible rate. Having data distributed across multiple locations, including warehouses and data lakes, adds complexity and cost.
Google’s solution is a new service called “BigLake.” BigLake is a storage engine designed to unify data across data lakes and data warehouses. Essentially data is analyzed without worrying about the underlying storage format. This architecture is a “data lakehouse” for those familiar with the term.
BigLake will support all open file formats such as Parquet, open source-processing engines like Apache Spark or Beam, and various table formats, including Delta and Iceberg and provides a single place to access all data across a multi-cloud environment, including Google Cloud Storage, Amazon S3, or Microsoft Azure.
With the reality of more and more data distributed in more places on the edge, I see BigLake as a very important product. And it is an “embrace and extend” play that removes objections for Google Cloud to be the enterprise’s “go-to” data service.
A single development environment for data science workflows
Vertex AI is an end-to-end machine learning (ML) workflow platform where data scientists and ML engineers build, train, and test ML models without any help from the IT team. It competes head-to-head with AWS’s Sagemaker.
Vertex AI Workbench is a new addition that provides a single interface so that teams can have common toolsets across data analytics, data science, and machine learning. It integrates with a full suite of AI and data products, including BigQuery, Serverless Spark, and Dataproc. Vertex AI Workbench is Jupyter Notebooks as a service, an open-source web application.
The Vertex AI Model Registry, currently in preview, will be a searchable repository for ML models. Vertex AI Model Registry will provide an overview of all models to organize, track, and train new versions.
This is all about increasing cross-develop productivity so each developer isn’t redeveloping everything, all the time, instead using models that have already been created.
Speed and convenience with trust and security
Looker’s semantic model is a single version of the truth across all data. Google’s new integration allows the Looker semantic model to be used within Data Studioreports, allowing people to use the same tool to create reports that rely on both ad-hoc and governed data.
Users will now be able to access and import governed data from Looker within the Data Studio interface and build visualizations and dashboards for further analysis.
Ensuring data portability and accessibility between multiple platforms
Google announced the establishment of a Data Cloud Alliance with other founding partners, including Confluent, Databricks, Dataiku, Deloitte, Elastic, Fivetran, MongoDB, Neo4j, Redis, and Starburst.
Each member commits to providing infrastructure, APIs, and integration support to ensure data portability and accessibility between multiple platforms and products across various environments. Members will also collaborate on new, standard industry data models, processes, and platforms.
The future will tell, but this could be one of the most important announcements of the show. Unless you have all the data like certain database and software companies out there, interoperability is so very important. While any database can batch export for ETL, having a API makes it so much easier.
Customer confidence that partner solutions work with BigQuery
Finally, Google launched the BigQuery Validation Program with 25 partners. The validation program will test the quality of a partner’s BigQuery integration, ensuring mutual customers have everything needed to implement a solution successfully. This is a good thing as not all integrations were equal in quality or speed and if a bar’s not set at a respectable level, the notion of “BigQuery integration” loses clout.
Google has a stated aim to remove all “limits of data.” A lofty goal indeed! These recent announcements are incremental steps towards achieving that goal.
Quite frankly, “data” is Google Cloud’s strongest suit whether it’s analytics, AI, ML or security. It’s where GCP “lands and expands” in its customer motions. It’s wild, I’d say half of the conversations I had in the past year with large enterprises had some kind of “data” project with Google. This makes a lot of sense when you think of it. Google was the first best example of “big data” when you look at Search and GMail when it came out 25 years ago. When Thomas Kurian came in to lead Google Cloud as CEO, I remember the #1 thing he would talk about after “we don’t use your data for advertising” was that he wanted to “meet customers where they are”. This meant not only providing homes for those SAP workloads (and more), but more importantly, creating data tools to pull data from where it is, derive intelligence from it, and if it’s transformed data, putting it back where it got it. Google is good at “data”.
Google Data Cloud summit is an event for anyone using or considering using Google Cloud technologies. If you missed the broadcast, the content will be available for on-demand viewing here.