All About Azure Synapse Analytics and Its Machine Learning Experiences

What is Azure Synapse? 

Azure Synapse is a cloud-based and scalable data warehousing solution designed by Microsoft that allows MPP (Massively Parallel Processing) to run complex queries across large volumes of data within the Microsoft Azure cloud platform. SQL Data Warehouse is evolved into Azure synapse so all the features of SQL Data Warehouse are included in it, along with some additional features like integration with BI, Machine Learning and Data Analytics. These additional features make Azure synapse a more unique and powerful data analytics solution in the market.  

Some of the additional features include: 

  • Workload optimization 
  • Cloud- Native HTAP integration 
  • BI and ML integration 
  • Advanced Security 
  • Able to process Non-Relational Data 
  • Serverless and dedicated options 

image001

 No matter what type of data you have, Azure synapse will bring together all types of data on a single platform to analyze it in a better way. In addition, Azure Synapse allows you to query data either using the on-demand serverless model or provisioned resources. On-Demand Serverless mode will enable you to pay per query, only for what you need and when you need it. In the "provisioned mode", resources are already assigned to the service.   

Architecture 

The architecture of Azure synapse includes the following four components. 

1. Apache Spark:  Apache Spark is the leading platform for managing SQL queries and ML Analysis on large volume of data. It consists of the following components. 

  • Apache Spark for Synapse 
  • Apache Spark pool 
  • Spark application 
  • Spark session 
  • Notebook 
  • Spark job definition 

2. Synapse SQL:  Synapse offers complete T-SQL-based analytics for relational and non-relational data. It consists of two consumption models. 

  • SQL Cluster (pay by the computational unit) 
  • SQL On -demand (pay by the number of processed terabytes) 

3. Synapse Pipelines: Hybrid data integration allows integrating different data sources so that you can quickly ingest your data from a diverse system into your data warehouse.        

4. Synapse Analytics Studio:A unified workspace for performing all cloud-based enterprise analytics in Azure using all Analytics tools related to BI, ML AI in a single place. 

image003

Azure Synapse and Machine Learning 

 Azure synapse integrates with various other Azure services, including Power BI, Azure ML, Cosmos DB, etc. Now we will discuss Azure Synapse and Machine Learning. 

The Machine Learning workflow helps data analytics to use its existing data processing techniques and model development skills. It allows them to build and access different available models easily with high efficiency, scale, and productivity. 

Today, there are a lot of things we can do in Azure Synapse through code. So, if you are well-versed in coding or have worked with notebooks and T-SQL, you can go for it. Additionally, to accelerate the time to insight, Azure synapse has added Guided UI experiences. These experiences can generate code artifacts for you making the task easier for users of all skills.

Synapse users can leverage the code for free and use code experiences to train their ML models. 

Some of the ML experiences in Azure Synapse are as follows: 

  • Model Training with Auto ML- Guided UI experience to train ML models using Auto ML- powered by Azure ML.  
  • Model Scoring in SQL pools- Guided UI experience to deploy a model from Azure ML into Azure Synapse.                                                                                                                                                         
  • Cognitive Services- Guided UI experience for data enrichment with Anomaly Detector and Text Analytics - Sentiment Analysis. 

Model Training with Auto ML 

Auto ML helps you to train your ML models without writing a single line of code. Even if you do not have much experience in machine learning, you can still train your model using Auto ML. Auto ML allows the user to choose the best model based on specific metrics. It then trains the model automatically. Auto ML helps convert the Synapse Analytics Data into actionable base line models to enrich datasets at scale. 

Model Scoring in SQL pools 

SQL Pools allow users to score their models using the T-SQL language. T-SQL Predict function enables users to utilize the existing trained ML models. These models have been trained using the historical data and score it within the boundaries of

the data warehouse. ONNX (Open Neural Network Exchange) is used as input by the Predict Function. The Guided UI experience helps you deploy the ONNX model from Azure ML to Azure synapse for batch scoring using Predict Function. To get data insights, the model is trained outside Synapse SQL. After this, it is loaded into the data warehouse and then is scored using T-SQL Predict Syntax. 

Cognitive Services 

An Azure Synapse’s users can select a table to enrich for detection of anomalies or select a table with a text column to enrich with sentiments using pre-trained ML models. This allows users who do not possess advanced machine learning knowledge to enrich their data easily. The user has two options; the Anomaly Detector and Text Analytics-Sentiment Analysis. Mmlsparklib is used to connect Azure synapse with Cognitive Services. This generates a notebook with a PySpark code that uses Azure Cognitive Services to detect anomalies or perform Sentiment Analysis. 

Conclusion 

In this way, Azure synapse combines enterprise data warehousing and Big Data Analytics into a single platform. By bringing together multiple cloud platforms, it helps customers add massive value to their offerings. Its integration with Machine Learning provides high efficiency and productivity while sustaining the model quality. These ML experiences accelerate the time required to train models with great ease. 

If you have questions or queries on Azure Synapse Analytics, feel free to reach out to us! We will be more than happy to help you!

 

Leave a Reply

Your email address will not be published.