In today’s post, I’d like to review what an Azure SQL Data Warehouse is, as well as its benefits to performance and efficiency. If you don’t know, Azure SQL Data Warehouse is a service that is Microsoft’s offering of a public data center in Azure.
It’s a Platform as a Service (PaaS) offering, so it allows you to develop, run and manage your data applications without the complexities of building and maintaining the infrastructure typically associated with developing and launching a traditional data warehouse. All you must be concerned with is loading and querying the database.
Azure Data Warehouse leverages Massively Parallel Processing (MPP) to quickly run complex queries across petabytes of data. An MPP system executes queries across a distributed system, where different computers, called nodes, cooperate to answer your queries. Azure SQL Data Warehouse relies upon distributed storage into multiple machines and these multiple machines compute answers to your queries.
A traditional data warehousing scenario is typically comprised of one large machine (physical or virtual) and typically on premise utilizes symmetric multi-processing (SMP). An example would be a classic SQL Server setup. In contrast, an MPP system is comprised of multiple machines and the MPP machines will all have a slice of the data from the database.
A query comes in, the query is distributed to each node and you get the answers to your queries quickly and efficiently. As a result, this scales very well as the amount of data increases.
With Azure SQL Data Warehouse, you don’t have to select specific configuration of CPUs, RAM or storage. Compute power of the database is measured in Data Warehousing Units (DWU). You only have to worry about the amount of DWUs that you provision.
There is the notion of Gen 1 and Gen 2 data warehouses. Gen 1 data warehouses are measured in DWUs (Data Warehouse Units) and Gen 2 data warehouses are measured in cDWUs (Compute Data Warehouse Units).
The difference of these performance tiers is reflected on the invoice as unit of scale, which directly translates to billing. Both support scaling compute up or down and pausing the compute when you don’t need to use the data warehouse.
Here’s an example, let’s say I provision a data warehouse with 100 DWUs. I test it and I loaded 3 tables in 15 mins and rendered a report in 20 minutes. Then I adjust and increase my compute power to 500 DWUs and run the test again. This time I loaded 3 tables in 3 minutes and rendered a report in 4 minutes, thus a 5X improvement.
By changing your service level, you alter the number of DWUs that are allocated to the system and this in turn adjusts the performance and cost of your system.
Clearly, using Azure SQL Data Warehouse can increase your performance and efficiency, as well as save you money. If you’d like to learn more about leveraging this service in your business or if you have questions about any Azure service or product, we are your best resource. Click the link below or contact us—we’re here to help.
Leave a comment