This blog is the first in a series that will discuss time series forecasting and its implementation in Azure Machine Learning Service (AutoML). We will learn the basics of time series forecasting and Azure Machine Learning Service (AutoML). We will learn how to create a Machine Learning workspace in Azure, and we will familiarize ourselves with the Azure Machine Learning Studio and its main features.
What is Machine Learning?
Machine Learning is a subdomain of Artificial Intelligence (AI) where we use statistical methods to enable machines to learn from data and solve complex problems. These methods can find patterns and insights from complex data. Machine Learning is different from ‘general programming.’ In ‘general programming,’ we code the program to produce an output, based on programmed rules. However, in Machine Learning, we do not explicitly code the program. Instead, we provide questions (input data) and answers (output data), and the machine learns from them and produces an algorithm that we can use for similar data.
Time Series Forecasting
Time series data is sequential and has a series of data points that are well-defined and consistent in terms of frequency. Time series data points are essentially made of four elements:
- level (average value)
- trend (increasing/decreasing)
- seasonality (repeating short term cycles)
- noise (random variations in the series)
Therefore, time series forecasting predicts future time series values based on historical values. Time series forecasting is different from standard regression or classification machine learning problems because time is the essential feature in time series forecasting. For example, a classification model predicting whether a person has diabetes can make predictions year after year. However, a time series model predicting retail sales will have to undergo retraining cycles to keep up with changing sales patterns. This makes time series forecasting more challenging than regression or classification methods.
Introduction to Azure Machine Learning (Azure AutoML)
Generally, a Machine Learning based AI solution consists of the following steps: data gathering, cleansing, transforming, and trying various machine-learning algorithms before finding the most suitable one for the problem. This is an iterative process where you must code every single model.
AutoML on Microsoft Azure’s cloud platform automates the iterative process of coding various machine learning algorithms. Given any clean data, Azure AutoML assists with feature engineering and runs parallel machine learning experiments using suitable machine learning algorithms. Once a machine learning model performs to the provided metric threshold, you can take the model to the production stage.
Preparing the Resources to train a Machine Learning model on Azure Machine Learning Service (AutoML)
Now that we understand what time series forecasting is and why we should be using Azure AutoML to train our Machine Learning models, the next step is to create an Azure Machine Learning Workspace that we will use to train Machine Learning models.
First, head to https://portal.azure.com and sign in. If you do not have an account, then make one on https://account.microsoft.com/account. You also get free credits that you can use to follow along with this blog series.
Once you sign into your Azure Portal, you will see a window as follows:
In the search bar, type Machine Learning and then click on the Machine Learning under Services as shown below and click on the ‘create’ button in the top left corner.
After you have clicked ‘create,’ you will be directed to a window that will ask for some basic information regarding additional resources you need to create a Machine Learning workspace on Azure. The Azure resources created alongside a workspace include:
- Storage account: to store files used by the workspace and for data related to experiments and model training
- Key Vault: to manage authentication keys and credentials
- Application Insights: to monitor predictive services within the workspace
- Container registry: to manage containers for deployed models
Also, it is essential to note the overall structure of subscription and workspace on the Azure cloud. The following figure by Microsoft does an excellent job in this regard:
Going back to creating the Machine Learning workspace, once you have clicked on ‘create,’ choose your subscription under the ‘Basics’ tab, and then provide names for Resource Group, storage account Key Vault, Application Insights, and Container Registry. You can select ‘create new’ to give custom names for all these resources. Also, for the storage account, choose locally redundant storage replication, and for container registry, you can select the Standard SKU.
Once you have provided the necessary information, you can click on ‘Review + create.’ Azure will then run validation. Once it is completed, click ‘create’ that will trigger the deployment of all the resources. The deployment may take 3-5 minutes. Once the deployment is complete, click ‘Go to resource.’
In the new window that opens, click Launch studio. This will take you to the newly created Azure Machine Learning studio.
Navigating the Azure Machine Learning Studio
Before training machine learning models, it is imperative to familiarize ourselves with the Azure Machine Learning Studio. This will help us understand Azure AutoML’s requirements when using Python to code our time series forecasting model using Python API. As you open the Azure Machine Learning Studio, you will see the following tabs:
Home: Home page
Notebooks: This lets you write a code in Jupyter notebooks in your workspace and allows you to organize notebooks, files, and folders.
Automated ML: This lets you run an automated training experiment using a graphical interface. This is an option for no-code or low-code users.
Designer: It lets you make data transformation and machine learning models using a graphical interface. This is also an option for no-code or low-code users.
Datasets: These are references to data in the Azure storage account. It can be a File dataset or a Tabular dataset. For more information: Create Azure Machine Learning datasets – Azure Machine Learning | Microsoft Docs.
Experiments: This is a record of your machine learning runs (training, inferencing, pipelines).
Pipelines: These are the pipelines with one or more steps that you have created. You can also publish the pipelines to get REST endpoints. These pipelines can be run from Azure Data Factory (ADF) or Azure Synapse and can be used for inferencing (predictions) or training/retraining models.
Models: These are the models that you have registered in your workspace from training runs. Azure Machine Learning Service will also track different versions of the same model that you have trained.
Endpoints: Once you have a machine learning model, you can deploy it as a real-time endpoint or a batch endpoint.
Compute: This is where you can provide and manage your compute resources. Compute resources can be of type:
- Compute Instance: A single node to run Jupyter notebooks
- Compute Cluster: CPU or GPU based multiple nodes to run training or inferencing experiments and pipelines
- Inference Clusters: Azure Kubernetes Service (AKS) based cluster for large scale inferencing jobs
- Attached Computes: You can attach external compute resources such as Azure Databricks, HDInsight cluster, Data Lake Analytics, or Synapse Spark pool (currently in preview.)
Environments: These are the pre-curated environments that you can use for your experimentations or make custom environments as per your need.
Datastores: Just as datasets are references to data on Azure Storage, datastores are references to blob storage, Azure File Share, Azure Data Lake Gen 1 and Gen 2, or Azure SQL databases.
Data Labelling: Data labelling helps label image data
Linked Services: It is a collection of integrations with external services. It is currently in preview.
Now, we know what time series forecasting is. We have also created a Machine Learning Workspace in Azure and have learned how to navigate the Azure Machine Learning Studio. In the next blog, we will use this knowledge to create a time series forecasting model on Azure Machine Learning Service (AutoML) to forecast Orange Juice Sales using an open-source dataset provided by Microsoft.