As more enterprises adopt Power BI into their BI environment, questions still remain about data security. In his last webinar, our Business Intelligence Architect Steve Hughes discussed data security and compliance within the Power BI platform including data classification, privacy levels and other settings that help manage security.
Expanding on his widespread knowledge of the topic, Steve has written a series of blog posts to answer specific questions about the service. His series will cover the following topics: privacy levels, data classification, sharing data, as well as compliance and encryption.
The Power BI service is updated frequently. These articles were created based on the Power BI implementation in early April 2017. You may find improvements and changes that impact your experience that are based on newer releases. Feel free to add comments to highlight changes.
The On-premises Data Gateway (a.k.a. Enterprise Gateway)
The on-premises data gateway (referred to as gateway throughout this post) “acts as a bridge, providing quick and secure data transfer between on-premises data and the Power BI, Microsoft Flow, Logic Apps, and PowerApps services.” (ref) Much of what is discussed here will apply to all of the services referenced above, but our primary concern is related to Power BI. Feel free to review the references at the end of this post for details about data sources supported within the gateway.
The gateway enables Power BI to use on-premises data for data refresh and direct access with Direct Query and Live Connections (SSAS multidimensional and tabular models). The gateway is used to manage connectivity and data transfer between on-premises data and Power BI with data compression and transport encryption capabilities as part of the solution. Our focus here is related to the most common questions related to the gateway’s use with Power BI. We will discuss security in relation to the gateway and how the data is secured when using the gateway.
Security on the Gateway
When the gateway is installed, the default service account NT Service\PBIEgwService is created as a Windows service login credential. This credential has “log on as a service” permissions. The first item to note: this credential is NOT used to access data sources. This service account has localized permissions to the server or PC it is installed on. It has no permissions to on-premises data sources or cloud services that use it.
In some situations, this can create issues with proxy servers. If you run into this situation, you can change the account to a domain account. Refer to the proxy configuration documentation to make that change. The recommendation is to change this to a managed service account in Active Directory to avoid resetting passwords which will disable the gateway and likely cause user satisfaction issues.
Data Sources in the Gateway
While the gateway does not have access to services or data sources, it does have the capability to decrypt the connection information used by Power BI to connect to on-premises services. When you add data source to the gateway you created, the credentials are encrypted using the key from the gateway.
Each gateway can manage multiple data sources. (NOTE: Best practices about location and performance of the gateway are not in scope of this post.) In my example, the gateway is providing access to a folder which contains receipt files. This will allow my Power BI solution to refresh data from the source. I can add a SQL Server connection as well if it is in the same network or context. The key here is that the gateway is an entry point for your on-premises data and is not limited to a single data source.
Credentials stored within the gateway cannot be decrypted in the cloud. The credentials are only decrypted by the gateway. When considering maintenance and configuration it is important to know that this is one of the key purposes of the gateway. Without a gateway, Power BI cannot access data in your on-premises solution (gateways are also required for Azure IaaS solutions, however, Azure SQL Database and Azure SQL DW do not require gateways as they are PaaS solutions and managed differently within Azure).
All data and information between the gateway and Power BI is encrypted. One of the primary concerns is around opening ports and the communication protocol that supports this communication.
The first important item to cover is that there are no inbound ports used by the gateway. The gateway creates an outbound connection to the Azure Service Bus using a specific set of ports including TCP 443 which is used for Power BI (complete list of ports used). It is possible to force the gateway to use HTTPS in lieu of direct TCP for all of its communication. If you require this as an organization, be aware that there may be performance issues. This setting can be changed in the gateway properties and will require a restart of the service.
Data and the Gateway
The second primary question regarding the gateway is about how data is handled. When a request from Power BI is submitted for data, the Azure Service Bus holds the request with the encrypted credentials. The on-premises data gateway polls the Azure Service Bus for requests. Once the request is received by the on-premises gateway, the connection is decrypted and the query request is sent to the appropriate resource. The data is then encrypted and compressed at the gateway and returned to Power BI.
No data is stored in the gateway and the data is encrypted for transit.
Users and the Gateway
One last consideration is related to who can use a gateway. In the Power BI service, you have the ability to manage access to data sources by user when managing the gateway (see diagram about about Data Sources). This functionality also supports security groups. When implemented, only users who have access to the data source can use the data source for Power BI datasets that they are deploying. This will prevent users from publishing content that would require direct access or data refresh to sources they should not use.
When they are able to use the gateway, they will have access to refresh scheduling and other options via the dataset properties (I use the Schedule Refresh option to open the dialog).
There are several considerations for enterprises who plan to implement gateways in their organizations. The key is to remember that this is a bridge that allows on-premises data to be accessed by cloud services. However, the cloud services do not initiate a direct request to the on-premises data. Microsoft has done a great job allowing for a hybrid approach that enables organizations to take advantage of cloud resources while minimizing the impact to their on-premises assets.