Share on facebook
Share on twitter
Share on linkedin

Merging with date range using Power Query

Kristian Radoš

Kristian Radoš

Experienced data analyst. Advanced in SQL, PowerApps and M language.

Share on facebook
Share on twitter
Share on linkedin

In this blog, we will see how to merge two tables in Power Query based on key column and a date range. Merging with multiple matching columns is straightforward and can be achieved in Power Query by selecting merge queries option and holding CTRL key when choosing matching columns like in the picture below:

In the picture above, we are merging two tables based on matching columns: Brand and Category. In this case, both Brand and Category must be an exact match. What we want to observe is a different problem, i.e. merging based on one matching column and a range of values from another column (in this example a range of dates).

The problem

In the picture above, we can see two tables we want to merge – Sales and Promotions. We want to get the Promotion column in the Sales table. The merge requires a match on the Brand column, but we also want to take into account the date range in which the promotion happened. The date range is given in the Promotions table with DateFrom as the lower boundary and DateUntil as the upper boundary.

For example, the date of the first row of the Sales table is May 22nd, and in this period there was no promotion. In the third row, we have July 1st, which is inside the SummerPromo (between 6/1/2020 and 9/1/2020).
In order to merge tables correctly, we need to use Brand and Date columns from the Sales table, and Brand, DateFrom and DateUntil columns from the Promotions table. To solve this problem, we will use nested environments and the full syntax of the “each“ keyword. In case you are not familiar with these topics, check out this blog in which we covered those topics in more detail.

The solution

We cannot use the standard merge option (Table.NestedJoin() function) as a solution for the problem since the Date column is not the exact (or even approximate) match of DateUntil or DateFrom. The Date column must be after or equal to DateFrom, and before or equal to DateUntil. Along with this, the Brand column of the Sales table must be an exact match to the Promotions Brand column. One way to solve this problem is to add a custom column that will use Table.SelectRows() function. This means combining Table.AddColumn() and Table.SelectRows(), which are both iterative functions. Table.AddColumn() is used to add a column to the Sales table that contains matching rows from the Promotions table. Table.SelectRows() is used to filter the Promotions table to select only the matching rows for each row in the Sales table.

Below, we will try to add a custom column and write an M code with the “each” syntax to get the rows from Promotions into the Sales:

#"Added Custom" = 
Table.AddColumn(Sales, "Promo", 
                each Table.SelectRows(Promotions, 
                                      each [Brand]=[Brand] and [Date]>=[DateFrom] and [Date]<[DateUntil]
                                     )
                )

In this step, we added the column in the Sales table with all 3 conditions (Brand from Sales equal Brand from Promotions, Date>=DateFrom and Date<=DateUntil).

When we hit enter, we get the nested tables, but inside the tables we get an error:

The error states that the Date field of the record was not found. As seen from the picture, Date column clearly exists. What is the problem then?

If we observe the Added Custom step, we can see that the “each” keyword appears multiple times. If we write the full syntax instead of sugar syntax and replace each with a variable named X (you can find more on each syntax here), we get the following M code:

#"Added Custom" = 
Table.AddColumn(Sales, "Promo", 
               (X)=>  Table.SelectRows(Promotions, 
                                       (X)=> X[Brand]= X[Brand] and X[Date]>= X[DateFrom] and X[Date]<X[DateUntil]
                                       )
               )

Since each is a sugar syntax for an unnamed variable, in the full syntax we introduced a variable called X. Because the variable X is introduced two times, we have X in front of all the columns inside the inner iteration (Table.SelectRows() function). Inside the inner environment (Table.SelectRows() function), X refers to the Promotions table, which is the first argument of the Table.SelectRows() function. Inside the Promotions table, there is no Date column.  The error states exactly that.

If we use the full syntax, we can introduce 2 different variables, one called S (the outer variable, S stand for Sales) and P (the inner variable, P stands for Promotions).

#"Added Custom" = 
Table.AddColumn(Sales, "Promo", 
               (S)=>  Table.SelectRows(Promotions, 
                                      (P)=> P[Brand]= S[Brand] and S[Date]>= P[DateFrom] and S[Date]<P[DateUntil]
                                      )
                )

By using the full syntax of each keyword, we introduced two different variables, referring to two different environments. In this way, for each row of the Sales table, we get nested tables with matching rows from the Promotions table. When we expand the Promotion column, we get the correct final result:

Since we are doing nested iterations, when working with large tables, we can encounter performance issues. For optimizing the load speed of the query, we can use Table.Buffer() function prior to “merge” to buffer the inner table (Promotions). The full M code for this solutions is given below:

let
    BufferedPromotions = Table.Buffer(Promotions),
    Merge = Table.AddColumn(Sales, "Promo", 
                            (S)=>  Table.SelectRows(BufferedPromotions, 
                                                    (P)=> P[Brand]= S[Brand] and S[Date]>= P[DateFrom] and S[Date]<P[DateUntil])
                           ),
    #"Expanded Promo" = Table.ExpandTableColumn(Merge, "Promo", {"Promotion"}, {"Promotion"}),
    #"Changed Type1" = Table.TransformColumnTypes(#"Expanded Promo",{{"Promotion", type text}})
in
    #"Changed Type1"

Hope you enjoyed reading through this article. In case you have any questions, please post them below!

Share on facebook
Share on linkedin
Share on twitter

LEARN POWER QUERY

Learn and improve your PowerQuery skills with
Exceed Academy

COMMENTS

Subscribe
Notify of
guest
0 Comments
Inline Feedbacks
View all comments

GET LATEST BI NEWS TO YOUR INBOX

Receive the latest updates on all business analyst news across all platforms.

By subscribing you are agreeing to our Privacy Policy.

Related blog posts

Merging with date range using Power Query

In this blog, we will see how to merge two tables in Power Query based on key column and a date range. Merging with multiple matching columns is straightforward and can be achieved in Power Query by selecting merge queries option and holding CTRL key when choosing matching columns like...

Read more

How to process multiple folders with a single PowerQuery script

Let us imagine the following scenario. We have multiple clients sharing their sales data with us. They store the data in dedicated folders that reside in different places. The number of clients, as well as the number of files for each client, will vary across periods. We wish to create...

Read more

Dynamically expand table or record columns in Power Query

In this article, we will show how to solve a common problem of expanding nested tables or records in Power Query, when the column names of the nested objects vary in numbers. In the second part of the article, we will show a more robust technique for dealing with changing...

Read more

Sorting in Matrix created with calculation groups (PowerBI)

This is the second part of the blog series dedicated to showing workarounds for currently unsupported matrix visual features. In the first part, we explained how to add conditional formatting only for the specified columns, and in this article, we will explain how to introduce sorting. Below, you can test...

Read more
0
Would love your thoughts, please comment.x
()
x
Scroll to Top