Query Folding in Action

Introduction: 

Folding in Power Query improves performance and decreases the load on Power BI (Business Intelligence) at the risk of exposing data between different sources. If you have not read my previous article, check it out because this one builds upon the concepts discussed in the previous article. Let us apply what we learned in the previous blog and get our hands dirty with Query Folding. 

What do you need to know? 

This article requires intermediate level understanding, so I will assume that you have a basic understanding of Power Query and know how to create, transform, and merge data sources. 

Premise 

The task’s premise is kept very simple, and the idea is to demonstrate how Query Folding can help improve a report’s performance. We have two data sources for the report: 

  1. An excel file that contains the SSN of customers. 
  2. A SQL Database that contains the purchases made by the customers. 

Our goal is to get the data from the SQL Database, where the customer SSN matches those in the excel file. So, the excel data works as a filter for the SQL data. 

Before moving on with the tutorial, we need to look at a concept called Privacy Settings in Power BI, and I promise this is the only piece of theory that we will be discussing in this blog. There is an entire blog for Data Privacy Settings in Power BI in which we have explained it in detail. However, here is a summary of the main concept: 

  1. None: This is the default privacy level. 
  2. Private: It means that the data of the source is highly confidential, and it cannot be passed to other data source. 
  3. Organizational: This privacy level is befitting for data sources that belong to the same organization; for example, if excel and SQL server have been configured using the same organizational domain, and both sources have privacy level set to Organizational, then the two sources can pass data to each other. 
  4. Public: This privacy level suggests that the data in the data source is publicly available and can be passed to any other data source. 

This article will use the Organizational Privacy Level for both data sources so that the data can be passed from Excel to SQL Server, and Query Folding can take place. In comparison, we will change the privacy level to private to prohibit query folding. So, by tweaking the privacy level, we can control whether folding happens or not and, consequently, gauge the performance of the two approaches. 

Walkthrough 

We have a dump of Stack Overflow 2010 database. We have a Comments table that contains the comments made by every user on Stack Overflow. We want to filter these comments based on User IDs given in a separate Excel file. Here is a step-by-step walkthrough of the process:  

  1. Create an excel data source and fetch Users from it in Power Query. 
  2. Create a SQL Server data source and fetch Comments table in Power BI. 
  3. Go to Data Source settings and change the privacy level of both the sources to “Public”.

   4. In the Comments dataset, add a new step to Merge the two datasets based on an inner join between the Comments.UserID, and Excel.ID columns.

    5. Click Close & Apply. We will see that Power BI only fetches 29000 rows from SQL Server and three rows from Excel. 

   6. If we also want to check the SQL query that was passed to SQL Server, we can use the SQL Server Profiler. Power BI sends the following query to SQL Server.

    7. We can see that Query Folding has taken place from the query because the Filtering and Merge have been shifted to SQL Server instead of Power BI. 

   8. Now, let us try the other option that does not involve Query Folding. Go to the Data Source Settings again and change the Privacy Level to Private. 

   9. Refresh the data sources again and click Close & Apply. 

   10. Now Power BI fetches 3 million comments and does the filter and merging in Power Query rather than sending it back to SQL Server. Following query is sent to SQL Server by Power BI.

   11. We can see that the Query Folding has not taken place here. Therefore, we are fetching 3 million rows instead of just 29 thousand rows. 

Conclusion 

By tweaking the data privacy level of the data sources, we were able to switch the Query Folding on and off to understand the difference between the optimization of the two approaches. Query Folding is much faster and uses less storage to perform the tasks.