Share on facebook
Share on twitter
Share on linkedin

Incremental refresh using PowerBI pro and Dataflows

Krešimir Ledinski

Krešimir Ledinski

Microsoft certified expert in the field of Business Intelligence. His biggest passions are DAX, M, and data modeling.

 In this article, we will see how we can use the self-referencing technique and Dataflows to create incremental refresh with PowerBI Pro version. This technique is tested and working both with cloud and on-premises data sources. In case you are using on-premise data sources to access new data, you need to add on-premise source in the Enterprise gateway prior to the creation of dataflows.

Update: We’ve also created a video explaining this concept in details. You can check the youtube video here

 

Self Referencing table

 

Let’s explain, in simple terms, what a self-referencing table is based on the picture below. 

Self-referencing technique

 

Steps to produce a self-referencing table:

  1. New Data enters the Main Table,
  2. History Data also enters the Main Table,
    • Notice that History Table in step 4 is fueled from the Main Table, after step 3 is completed (in this step the self-reference is created)
  3. History and New Data are Appended into a single table,
  4. If the script of the Main table finishes successfully, History table gets larger for the Appended New Data part.

 

The process repeats with a refresh. So, for each following refresh, History Table grows to accumulate previous history + new data. The picture above is the key to understanding this article. If you feel lost at some point, come back to this part to thoroughly understand the technique (you can also check the link in the following paragraph).

So, to create a self-referencing table, we need a container for the historical data (History Table). Usually, this is an Excel table that is used to create self-reference applying a technique discussed in the following blog article. Using a similar approach but with dataflow acting as a container is a much better option since we can use an out-of-the-box scheduled refresh to keep our data up to date. In our example, Both Main and History tables are created with Dataflows.

 

Now let’s see how we can use this approach to create incremental refresh with PowerBI Pro Dataflows.

 

Solution with dataflows

 

To keep it simple, we will reproduce an on-premise data source using Excel file saved on our local drive (this could be any online or on-premise data source). Our Excel has only one table containing 3 columns. Date column will be used as a key column for appending new data.

 

Excel data source

 

Process:

  1. Create a Main table inside Dataflows.

    1. DataFlow Query is: (If you wish to follow along, you need to create a similar table in Excel named Source, save it to your local drive and change the folder location in the source step. You also need to add your Excel location under an on-premise gateway sources).

      let

         Source = Excel.Workbook(File.Contents(“C:\Users\kresi\Desktop\TestGateway\test.xlsx”), nulltrue),

         Navigation = Source{[Item = “Source”, Kind = “Table”]}[Data]

      in

         Navigation



      Main table in Dataflows


  2. Connect to the Main table Dataflow from PBI desktop using Power BI dataflows Connector (You can also connect directly in Dataflows, although the process is often quite slow). 

    Connect to Dataflows

    1. Use the M code created in step 2 to create History Dataflow.

    2. Set History query not to load. This way we will avoid linked entity which is reserved only for premium capacity.


      History table in Dataflows



  3. Turn the Main table into a self-referencing table.

    There are 4 parts in the Main Table query that needs to be modified. Each part is explained in details below.


      let
      Source = Excel.Workbook(File.Contents(“C:\Users\kresi\Desktop\TestGateway\test.xlsx”), nulltrue),
    1 Navigation = Source{[Item = “Source”, Kind = “Table”]}[Data],
       
      NewData = Navigation,
    2 AMinDate = List.Min(NewData[Date]),
    3 FilteredHistory = Table.SelectRows(History, each [Date] < AMinDate),
    4 CombineOldNew = Table.Combine({FilteredHistory, NewData})
      in
      CombineOldNew


      1. Part containing the script to acquire new data. This is the step in which we determine how many days in the past we want to retrieve. This could be, for instance, a SQL statement that returns the last 5 days of sales from a database.

      2. AMinDate variable containing the first available date of the new data. In case of SQL statement, it will, for each refresh, return the oldest of the last 5 days of sales. This date will be used to filter out history table so that no duplicate values occur.

      3. Filter History Table based on AMinDate variable. Since we will be running the same script every day, we need to filter out days that are already included in the NewData part of the script. AMinDate variable will help us achieve that. In this step a self-referenced table is created.

      4. Append NewData to HistoryData.


After the 4th step, if the refresh is successful, History Table automatically grows to include NewData.


It is important to understand that History query is referencing Main query. Because of that, History is refreshed only after the refresh of the Main Table succeeds. This enables History to automatically grow and include new data after each refresh.

 

Bringing up-to-date History 

 

If we want to bring full history to dataflows before setting to incremental refresh, we have 2 choices.

  1. Bring history before the start of an incremental refresh to a separate DataFlow, then append them in PowerQuery (in PBI Desktop).
  2. Run an up-to-date full DataFlow refresh once (using the script above), then change the 1st part of the script to a shorter timeframe. 


After the initial up to date load, we can set each following NewData load to acquire only the last couple of days. AMinData variable will make sure that no duplicate values enter History table prior to append. We should set NewData to load at least the last couple of days, to have time to correct Query in case it fails to refresh. Refresh details in the picture below.

 

  1. Incremental Refresh

 

 

Incremental Refresh based on New ID-s 

 

In case we need to append new data based on the ID column values (not on last available dates), we can use the RightAnti approach to exclude all new IDs in HistoryData. M code below should get you started.

 

let

   Source = Excel.Workbook(File.Contents(“C:\Users\kresi\Desktop\TestGateway\test.xlsx”), nulltrue),

   Navigation = Source{[Item = “Source”, Kind = “Table”]}[Data],

 

   NewData = Navigation,

   AColumnNames = Table.ColumnNames(NewData as table),

   MergeQueries = Table.NestedJoin(NewData, {“id”}, History, {“id”}, “History”, JoinKind.RightAnti),

   RemoveOldColumns = Table.RemoveColumns(MergeQueries, AColumnNames),

   ExpandedHistory = Table.ExpandTableColumn(RemoveOldColumns, “History”, AColumnNames, AColumnNames),

   CombineOldNew = Table.Combine({ExpandedHistory, NewData })

in

   CombineOldNew

 

 

 

AColumnNames – list variable containing names of all the columns in a table,

id – name of the column which holds unique IDs that need to be added,

RighAnti will remove all IDs from HistoryTable that are already available in the new data.

 

Hope these techniques will help you ease the burden on the on-premise sources when refreshing data. It can also be useful to save history data in case of accessing APIs with limited API calls. 

 

 

In case of any questions, please ask. If you enjoyed reading through the article, please like/share!

 

Share on facebook
Share on linkedin
Share on twitter

LEARN POWER QUERY

Learn and improve your PowerQuery skills with
Exceed Academy

COMMENTS

Subscribe
Notify of
guest
18 Comments
Newest
Oldest Most Voted
Inline Feedbacks
View all comments

GET LATEST BI NEWS TO YOUR INBOX

Receive the latest updates on all business analyst news across all platforms.

By subscribing you are agreeing to our Privacy Policy.

Related blog posts

Sorting in Matrix created with calculation groups (PowerBI)

This is the second part of the blog series dedicated to showing workarounds for currently unsupported matrix visual features. In the first part, we explained how to add conditional formatting only for the specified columns, and in this article, we will explain how to introduce sorting. Below, you can test...

Read more

How to track changes in the table in Power Query

If you ever had a need to check if anything changed in the new table compared to the previous export, no matter in which column, then this is the article for you. We have created an M script in Excel that uses the left anti join to check for data...

Read more

Split packs into single products in Power Query

This is going to be an easy to follow article on how to use PowerQuery to easily change source data granularity for only a subset of data. We used a client’s request to split packs into its components as an inspiration to write this post. You can also download Excel...

Read more
18
0
Would love your thoughts, please comment.x
()
x
Scroll to Top