The lack of a percentile function in DAX is a commonly brought up challenge. In order to replicate a version that matches what the Excel Percentile.INC (inclusive) function does, you have to jump through a few hoops. This process involves having to create several measures to hold some intermediate values to apply a final formula. In theory, you could do it all in one DAX expression, but it would very difficult to read and test. Here's a step-by-step guide for replicating Excel percentile in DAX.
The Microsoft Excel Method
Before we begin, you should probably know there are actually several methods to calculate a Percentile, and the Excel method is detailed on this Wikipedia page:
This method can be summed up by the following steps:
- Calculate the target rank of the percentile value you are looking for. So for example, if you were looking for the 25th percentile out of 100 values you would find the position using the formula:
- Grab the actual values in a ranked list for the next closest ranks as a integer value (i.e. whole number). So if your rank target from above was 25.5 then we need to know the values at rank 24 and rank 26. We'll call this IntegerHigh and IntegerLow.
- Now that we know those values we can use the formula that Excel uses to linearly interpolate what the value should be at that percentile.
As mentioned above for ease of being able to follow along, we will be using separate measures to accomplish this and just hide those measures from the client tools at the end.
First, here is a screenshot of the data that I will be working with:
This was just randomly generated numbers ("test scores") between 1-100 for 3 exams and 3 schools. We'll be showing how to make sure the percentile values obeys any slicing of the data that is done. This data was generated using Excel RANDBETWEEN functions and some INDEX functions to pick a random value from a list.
Second, we will import this data into a Power Pivot table and add the relevant AutoSum measures like average, min, max, etc. that might be useful for analysis.
Step for Replicating Excel Percentile in Dax
Now we are ready to implement the steps for the formula above.
1. To find the target rank for the 50th percentile we will use the following DAX expression.
Note here we are using the ID to make sure we aren't grabbing this at the table level but the ID level. So some unique identifier is needed in order to be able to tell these apart when the values end up in different slices. If you don't have one in your dataset you can generate some using Excel or if pulling from a SQL table use the unique key or add a ROWNUM() or something similar to your query.
2. To grab the IntegerHigh and Low we are going to use the TargetRank we derived in the last step and do a ROUNDUP and ROUNDDOWN to find the bounding ranks. We have to Rank our data (TOPN) in order to sort it to grab the proper bounding values.
[Sum of Test_Score],
[Sum of Test_Score]
[Sum of Test_Score],
[Sum of Test_Score]
Note here we also use a CALCULATETABLE on our ID. This is to make sure that we execute this in the context of each slice of data that our IDs belong to, otherwise this would only work if we were including all the values in the table in our data set.
3. Now that we have the High and Low values at the bounding ranks we just simply replicate the exact same formula that Excel uses to grab the linearly interpolated value.
Now that we have implemented all the required elements we will hide our helper functions and go test this out in a PivotTable and see if it matches Excel's equivalent and also if it readjusts to the slices. For my demo workbook, I did the 50th and 90th percentiles. Note that the 50th Percentile is the same as a median so we can do a median that will match Excels median as well. There are technically slightly easier ways to do the Median in DAX since it's just dividing the data into two halves but this method works just fine as well.
Excel formula I am trying to match:
PivotTable with no slicers:
PivotTable sliced by School and Test type:
Everything looks to be working and it matches Excel and does adjust when I add in the School and Test type (which gets harder to do with the Excel Percentile formula). I have tested this with a workbook with several hundred thousand rows and memory will go up a bit depending on how many of these that you are calculating but should still perform reasonably.
If you are interested in trying it out yourself, I have the demo workbook here: Excel 2013 Workbook
To learn more about Josh, please check out his blog
or follow him on Twitter
Do you have questions about Excel or DAX? Submit them in the comments section below and Josh will be happy to answer them!