When working with Power BI, sometimes the refresh speed is excruciatingly slow. This can seriously hamper the development time of your data analytics team. So what's the best way to improve the refresh? Here are some tips I've discovered that will help to speed it up.
Make the Power Query steps more efficient
OData Filtering
Native Query
1) Make the Power Query Steps More Efficient
The first culprit to look at is your steps in the Power Query editor. Power Query makes it easy to apply data transformations, filtering and what-not. However, with some easy modifications, you may be able to make them more efficient.
1A) The first thing you can do is to combine any filtering steps.
You may notice steps like "Filtered Rows", "Filtered Rows 1", etc..
We can combine these into just one step, so that Power Query doesn't need to do a filter function twice. 1) To do this, click on the last filter step you see.
2) Copy the filter logic from the formula bar. In the example below, it would be:
[Date] > #date(2020, 1, 30)
3) Paste this at the end of the first "Filtered Rows" step, separated by "and". For example:
[SalesPersonID] = 100 and [Date] > #date(2020, 1, 30)
4) Now just remove the last filter step (the one you just moved).
Congrats! Now you've improved your query efficiency by 1/n steps! Keep in mind that this is not limited to filtering functions. Changing Types, renaming columns and re-ordering columns are often leading contenders that can be combined into one step. Not enough? Read below for more.
1B) The next thing you can do is use a Reference Query
If you have multiple tables using the same source, you can create one, single query for the source. You can then reference that source query for your other tables. This allows Power Query to just load in the source one time instead of once per table. Put into a diagram, it would look like this:
As opposed to this:
1) To do this, make a duplicate of any tables in the Power Query editor by right clicking -> Duplicate.
2) On this duplicate, delete all of the steps except for the first step (which is probably named "Source"). Give it a good name like "DataSource".
3) Now, right click on this query and uncheck the "Enable Load". We do not want to actually load this into the dashboard.
4) Finally, for each table in your data model, replace the first step with this new query. For example, if you named your source "DataSource", you would type "DataSource" in the formula bar.
= DataSource
1C) Remove unnecessary columns.
If you are not doing analysis on 'hair color', you do not need to have a column for 'hair color'. Removing unused columns will help speed up the process. In the Power Query editor, simply right click on a column, and select Remove Column. If possible, try doing this as close to the source as possible.
2) OData Filtering
I was once working with a data set using an OData source that took 40 minutes to load into the data model. This was painfully slow and dragged the project development time down to a crawl. Fortunately, there is something called OData filtering. These are commands you can type at the end of the OData URL.
$filter
Context: $filter=yourColumnName eq yourvalue
Example:
rootURL/mycompany('sometextname')$filter=ProductID eq 1000
You can combine multiple filters with the 'and' keyword.
...('sometextname')$filter=ProductID eq 1000 and CustomerName eq 'Bob'
Another command you can use to improve the speed is by bringing in the columns you only need with the 'select' command.
$select
Context: $select=aColumnName, anotherColumnName
Example:
...('sometextname')$select=ProductID, CustomerName, SalesAmount, Date
Though OData is not limited to these two functions, they can make a significant improvement on your refresh speeds. Exploring all OData functions is outside the scope of this article, but you can learn all about them here: https://www.odata.org/getting-started/basic-tutorial
3) Native Queries
If you are accessing a database, you can utilize what is known as a 'native query'. This is simply the SQL code that Power Query uses to pull in the data (and 'masks' with its Power Query M language). If you are familiar with SQL, you may be able to analyze it yourself and write a more efficient SQL query than what Power Query generates.
The Power Query function for connecting to a SQL database is the following:
Sql.Database(server as text, database as text, optional options as nullable record) as table
What we are interested is the options parameter. Here we can pass in the SQL code directly by specifying Query = "SELECT .... FROM...". The format is as follows:
= Sql.Database("My Server", [Query="SELECT ... FROM..."])
Performance implications are big if you can do this. Of course, writing your own SQL query and applying filters there will remove the transparency to any other developers of what's going on if they are not familiar with SQL.
That's all for this post. Cheers!
Comments