This article is in continuation of the previous two blogs on Query Folding [Reference Blog: Query Folding in Power BI], and it builds upon the concepts discussed there. Hence, it is imperative that you understand query folding to know the motivation behind different privacy settings in Power BI. However, I will try to summarize the discussion in a single line here: Query Folding is the process of delegating more work to the source system rather than Power Query for efficient processing of data, at the risk of exposing data between multiple data sources.
To minimize this risk, Power BI has introduced Privacy Settings to control which data can be exposed through Query Folding and which data should be protected at all cost.
Privacy Levels in Power BI
There are four types of Privacy levels in Power BI:
This is the default Privacy Level set to each data source when it is first queried in Power BI. If the report has only one data source, there is no need to change this privacy level because there is no risk of exposing data between various sources. However, as soon as a second data source is created in the report, Power BI asks for the privacy level to be determined for both the data sources because it cannot be kept to None now. The privacy levels need to be determined before Power BI proceeds further.
If we set the Privacy Level of a data source to Private, then Query Folding will never occur on this dataset, nor will the data from this source pass to any other data source for Folding. This is a double-edged sword. The plus point is that we can rest assured that this data source's data is completely protected from other sources and will never be exposed to them. The downside is that this may happen at the cost of performance. From the example of Query Folding in our previous blog, we see a massive performance increase in both time and storage, and we forgo it for additional security.
Let’s say that you are building an Employee Performance report. The report is pulling all the data for each employee from an Excel sheet. The data contains sensitive information such as Date of Birth, SSN, or Credit Card Number. You do not want to risk sending this data to other data sources for additional performance benefits. Hence, you can mark the Privacy Level of this source as Private, and the data will not leave Power BI.
This Privacy Level is a particularly good balance between security and performance. It helps in securing data sources on an organizational level. Between the data sources of the same organization, Query Folding will take place. However, if there is a data source outside the organization, the data will not be shared between the two sources.
Let’s say that you are building an inventory report. You will need data from the warehouse to check how much stock you have for each product. You also need the sales data to analyze whether the stock you have in the inventory will be enough for the next week or not. We are sourcing data from two diverse sources, but they are within the same organization. Hence, if we are merging the two sources, Query Folding will occur for the organizational privacy level.
However, you also need the data about the suppliers to use the report to contact suppliers for a product in case its quantity is insufficient. This data is not from your organization and is hosted on a third-party source. This third-party source could be OneDrive from another domain, Azure Blob Storage, or even SQL Server under another tenant. When you bring this data to your report and merge it with the existing data, Query Folding will not occur for Organizational Privacy Level. Hence, there is no way to bring only the relevant suppliers. You will have to fetch them all and filter the relevant ones in the report. Depending upon the size of the data and the implementation logic, it could be very performance-intensive, but you will also mitigate the risk of exposing your organization’s data to external systems.
The name says it all. There is no security for this Privacy Level. The data from a public source can be passed to other sources as part of Query Folding. There are no restrictions upon this data source, and it gives the best performance when combined with the benefits of Query Folding. However, you need to be completely sure that there are no security concerns with marking a data source as Public.
You are building a Sales Prediction dashboard for a beverage company, and you are pulling live weather data into your report to see how temperature affects the sales of a beverage. This weather data is a public dataset, and there is no harm in passing the weather data to other sources through Query Folding.
So, there we have it. If you are following along, then you have a pretty good understanding of Query Folding and Data Privacy Settings in Power Query. Equipped with this knowledge, you can solve firewall errors half the time. To understand the remaining half, we will have to investigate Partitioning in Power Query, which is our next blog in the series [Reference blog: Partitioning in Power Query].