Data is crude. Data is the new currency. Data drives the modern business. Add these cliches to the long list that describe the modern business. And while these cliches are maybe overused, they are true. Data transformed into intelligence drives the modern business. And the business that can glean insights from intelligence fastest is best poised to win.
But not all intelligence is created equal. Intelligence derived from a subset of data sources across the enterprise is incomplete and can lead to bad decisions that can harm a business rather than help. And unfortunately, many organizations neglect to see the big picture because of the many departmental databases that contain essential data but that never gets exported into an analytics platform alongside other data sources.
It is because of this scenario described above that Oracle introduced MySQL HeatWave. MySQL HeatWave is a fully managed MySQL service from Oracle which enables organizations to run OLTP and analytics from a single database, eliminating extract, transform, load (ETL) functions or the need to maintain multiple databases.
Developed as a genuine cloud service on a continuous innovation cycle, the MySQL HeatWave team quickly responded to the feedback from customers around automation with the release of MySQL Autopilot in HeatWave in August 2021. MySQL Autopilot uses advanced machine learning techniques to automate HeatWave, make it easier to use, and accelerate performance. It improves scalability – including provisioning, data loading, query execution, and failure handling, enabling customers to focus on the business instead.
As if on cue, Oracle has announced the third major release of MySQL HeatWave in 15 months, bringing machine learning, including model training, inference, and full model explainability inside the database – with the new HeatWave ML at no additional cost. But how automated exactly is HeatWave ML, and can it bring machine learning to the masses? We'll uncover this in the following few sections.
First, the setup – what is AutoML?
Machine learning (ML) is critical to any business looking to automate its business processes. For those unfamiliar with ML, there are seven basic steps to establishing a functioning ML model:
1. Define the problem
a. Also, define success
2. Find your data
3. Collect and prep your data
4. Build your model & begin training
5. Look at the results and establish baselines
6. Put the model in operational beta – make sure it operates as intended
7. Go live
8. Iterate as necessary
Anybody who has spent time doing business process analysis and application development may think that this process looks somewhat familiar – and there are parallels. However, ML is difficult to set up, utilize, and manage. From finding (and tailoring) the right infrastructure to setting up and tuning algorithms to tying relevance to model outputs (e.g., how was a decision arrived at in a decision tree).
AutoML (Automated machine learning) is a discipline and effort to automate the process of machine learning, enabling more organizations to deploy machine learning – even those without a team of data scientists and a huge team of IT professionals.
Never one to miss an opportunity, cloud service providers (CSPs) started offering automated machine learning (AutoML) services for customers. The promise of each CSP is similar – an organization simply needs to point its data at this service, and the magic happens. Predictions are made, decisions are implemented, and everything is well.
But as we know, with the cloud – promise and reality don't always align. And even when they do, there's always an opportunity to be better, faster, cheaper, and more secure. And against this market opportunity as the backdrop, Oracle released HeatWave ML.
What is HeatWave ML?
HeatWave ML allows users to run training, inference, and explanation natively inside MySQL HeatWave using the familiar SQL interface. This with the promise of performing ML functions easier, faster, cheaper, and more securely. But does it? And how open and usable is HeatWave ML?
Training is the most expensive phase in the machine learning life cycle and is vital since it impacts the accuracy of predictions. Enterprises, therefore, employ data scientists to aid with training a model that involves multiple steps. HeatWave ML fully automates this process inside the database, including pre-processing of data, algorithm selection, intelligent data sampling, feature selection, and hyperparameter tuning.
The promise of HeatWave is this - customers can accelerate ML initiatives securely with no additional charge and reduce costs by avoiding the need for data scientists. Additionally, all models created by HeatWave ML are explained, improving regulatory compliance, fairness, repeatability, causality, and increasing trust in machine learning.
While HeatWave ML's architecture can drive real, measurable value to an organization in terms of performance and cost, I believe the automation built into HeatWave ML will make it tangibly easier for customers to use, extending ML beyond the realm of data scientists. With HeatWave ML, Oracle has abstracted a lot of the fine-tuning and pseudo witchcraft that organizations must employ to make an ML training model "just right." And this applies across the organization – business analysts and data scientists looking to make sure the intelligence is correct and IT professionals being asked to deliver the right overall solution. In addition, HeatWave ML enables organizations to use popular tools for managing environments, such as Jupyter and Apache Zeppelin.
As a result of this automation, HeatWave ML saves customers time and effort by fully automating the ML training lifecycle inside the database. Customers can rest assured that HeatWave ML will generate a well-tuned model without human intervention. Furthermore, the generated model is stored securely inside the database, with both the model and the predictions made explained if needed.
What happens when my datasets grow?
Part of what makes machine learning environments so hard to maintain is the constant care and feeding required to maintain performance and accuracy. And as one would imagine – as the data in my organization grows, I need more computational resources for training and inference.
As part of HeatWave ML, Oracle automates the process of assigning resources and optimizing training as datasets grow. The goal is to allow customers to achieve equally fast results on training larger datasets in bigger clusters without sacrificing accuracy.
In the above example, you can see where HeatWave ML autotune outperformed more manual tuning as a cluster size grew. The benefits of such a capability are not limited to the performance but also the amount of time saved by the professionals tasked with tuning the model for the larger cluster. The impact? Hands-free retraining and improved accuracy of predictions.
Does HeatWave ML perform as advertised?
Every tech vendor props up benchmarking claims around performance and cost. Oracle, by contrast, has taken a different approach by offering full transparency into its benchmark configurations, scripts, and all data needed for customers to replicate the benchmarks for HeatWave ML, just as it did for previous releases of HeatWave.
As part of this announcement, Oracle also included TPC-DS benchmark results. TPC-DS benchmark has more queries, more complex query constructs, more complex schema, higher data skew than TPC-H, for which Oracle had provided results in Aug 2021. When running the queries in the TPC-DS benchmark (10TB dataset), Oracle demonstrated an incredible 14.4x price/performance advantage over Snowflake.
Oracle's MySQL HeatWave is a cloud database service that has been architected for the commodity cloud and is in a state of continuous innovation. Further, the HeatWave engineering team has filed several dozen patents for the algorithms and technology – the HeatWave ML technology alone has around two dozen patents. The company is taking a service that provided real differentiation from its inception and continues to innovate at an accelerated pace for data-driven organizations.
As a former IT professional, who managed a lot of data, I like how the MySQL HeatWave service takes otherwise complex and expensive functions and makes those available to the masses. MySQL HeatWave simplifies data management, democratizing the latest cloud database technology beyond data scientists and DBAs.
Oracle clearly understands the needs of its customers and the data management market incredibly well. The way the company can continually innovate in significant ways is impressive. And solutions like HeatWave ML should lead to the company's expanding its OCI customer base. I will certainly check back over the next few quarters to report its progress.