Amazon Web Services (AWS) is a significant force in the public cloud market. Every year it hosts AWS re:Invent, considered by users and analysts as one of the most important annual technical cloud conferences. For the first time in several years, AWS re: Invent 2021 could either be attended live in Las Vegas or followed online. One of the conference’s main attractions (besides being in Vegas) has always been the number of new product announcements AWS makes each year. As a technical analyst covering artificial intelligence, this year, I was particularly interested in AWS announcements about Amazon SageMaker.
Amazon SageMaker is a fully managed machine learning service that makes it much easier to build, train machine learning models then deploy them into a production-ready hosted environment. It has made machine learning more accessible, allowed models to be created and run at scale, reduced training time, and led to standardized MLOps practices in many organizations.
This year AWS announced six new Amazon SageMaker features:
· SageMaker Canvas: Provides the ability to generate more accurate machine learning predictions using a point-and-click interface—no coding required.
· SageMaker Ground Truth Plus: A fully managed data labeling service that uses a highly skilled workforce and built-in workflows.
· SageMaker Studio: Makes data engineering, analytics, and machine learning workflows accessible within a universal notebook.
· SageMaker Training Compiler: Helps train deep learning models up to 50% faster by automatically compiling code to make it more efficient.
· SageMaker Inference Recommender: Automatically suggests the optimal AWS compute instances for running machine learning inference with the best price-performance.
· SageMaker Serverless Inference: Provides serverless compute for machine learning inference at scale.
Although I will be covering Amazon SageMaker and its new features in a future article, for this one, I want to highlight two of the six new features.
Amazon SageMaker Ground Truth Plus:
A large, labeled dataset is necessary to train a machine learning model properly. Traditional labeling methods are not cheap because datasets are enormous, and it is a labor-intensive process.
Your options are hiring and managing your workers to label the data or contracting the work to a company specializing in labeling data. You can use what I consider to be the most efficient and most cost-effective method - Amazon SageMaker Ground Truth Plus. This method produces high-quality training datasets by using a combination of human workers and machine learning to create datasets with high-quality labels.
I can offer my opinion because I previously used a standalone part of this AWS product numerous times about ten years ago. The part I used is called AWS Mechanical Turk, a main component of the Ground Truth process. It consists of rated human workers providing contract services to AWS on a bid-basis. When I used the service, I needed to label and sort several hundred thousand customer survey responses from a primary e-commerce site. Not only did Mechanical Turk prove to be a fast process, but it also was a very cost-effective way to obtain results without the time-consuming process and expense of hiring, housing, and managing a temporary workforce. To make the job even more efficient, the Ground Truth process can also use machine learning to determine how your training dataset should be labeled. This feature is called automated data labeling. This ML process decides which data needs to be marked by human workers and which is suitable for machine labeling.
Although it has been ten years since I used the process, AWS is still using it speaks for itself. And because all AWS processes are subject to continual improvement, I am confident the service is even more efficient now than when I used it.
Amazon SageMaker Canvas
Just as I chose to highlight Amazon SageMaker Ground Truth Plus for personal reasons, I have also picked Amazon SageMaker Canvas to highlight for another personal reason. But not because I have experience using it, but because I intend on using it.
For background, one of my personal research projects involves collecting data on ionospheric conditions that impact the propagation of very low-power HF radio signals and locations where signals propagate. As a simple explanation, each HF radio signal is refracted by the ionosphere in a different coordinate plane that depends on the HF signal’s frequency and angle. The refraction point is further affected by about 40 various space weather factors.
Shortly, I will have collected data on about 500,000 refracted signals, each refraction influenced by ever-changing 40 different space weather factors. I want to solve the problem: For a given frequency, transmitted from a certain location, at a specific time of day, under prevailing space weather conditions, what locations will receive that signal, and at what strength?
That is a lot of data. Analysis of this data can only reasonably be done with machine learning. It also happens to be a perfect candidate for Amazon SageMaker Canvas because I am not a data scientist.
Amazon SageMaker Canvas will attract more users for the same reasons I am interested in using it - SageMaker Canvas provides access to machine learning with a visual point-and-click interface. A business analyst will be able to generate an accurate machine learning prediction without prior machine learning experience or experience in writing code for machine learning.
SageMaker Canvas also offers a number of other benefits:
· SageMaker Canvas allows fast access to data regardless of location, in the cloud, or on-premises. Datasets can be combined or unified for model training.
· SageMaker Canvas also automatically detects and corrects data errors and analyzes data for its readiness to use machine learning.
· SageMaker Canvas has a built-in feature called AutoML that automatically creates machine learning models based on a user's unique use case and the dataset.
· Another nice feature is that SageMaker Canvas is integrated with SageMaker Studio, allowing collaboration with other developers and data scientists.