Split packs into single products in Power Query

This is going to be an easy-to-follow article on how to use PowerQuery to easily change source data granularity for only a subset of data. We used a client’s request to split packs into its components as an inspiration to write this post. You can also download Excel workbook and follow along.

Sometimes in the sales data, we can have sales quantity and value of packs with multiple products included. We will show how to split those packs into individual products and how to assign correct sales values and quantities to each individual product within the pack, based on a specific key.

The problem

The Fact table contains multiple products, but it also contains packs of products. For example, we can have Pack X which consists of 3 products: Product A, Product B, and Product C. The quantity of each product within a pack is different. There are 5 A Products, 10 B Products, and 15 C Products. We want to remove packs from the Fact table and only show granularity on the product level. We also need a method to equally split the total value of the pack into individual product values.

We will add another pack named Pack Y, with 10 Products B and 10 Products C. Now, let’s demonstrate how to split packs using our small sample (one day, two packs).

This is our Fact table. Instead of Pack X and Pack Y, we want to have individual products, sales quantity, and sales value for each product inside the pack. At the moment, in the ProductKey column, we have mixed product and pack keys. We want to isolate PackKeys and manipulate them exclusively while leaving real products intact.

Most of the time, when dealing with packs, clients have components list where we can find which products are a part of the pack SKU. Usually, they are in a table similar to the one below.


In this table, we have pack descriptions with pack quantity (sum of individual product quantities inside of a single pack), and a description of each component of the pack with their quantities and prices.

The idea

We would want to merge those 2 tables to get individual products for each pack that is present. To do that, first, we need to get both tables inside a Power Query (queries named FactTable and PacksTable).

The first step is to merge queries by using ProductKey (from the Fact table) and PackKey (from the Components list) columns. Merge is done using the left outer join. Then, we will expand the table with columns [PackQTY], [Product], [ProductKey], and [ProductQTY]. Now, we have a table that contains two columns with product key (ProductKey and ProductKey.1) and product name (ProductName and Product).


We will use expanded columns to create a new conditional column that will populate null values of the ProductKey.1 column with the value of ProductKey. That way, we will get a new column consisting of only ProductKeys.

This techique fails if you have a null value as a standard value in ProductKey column, which is something that should not happen and if it does, it should be cleaned before this step.

Now, we have the correct product keys and product names. To get the correct quantities, we will create a custom column in which we will multiply product quantities and sales quantities if they are packs.


The if part checks if the product is a pack, and if it is, multiplies SalesQTY by ProductQTY. ProductKey.1 is an expanded column, so it contains nulls for all products that are not packs. After we split the sales quantity, we get this table:

Splitting the Value Column


We can see that we have the correct quantities and product keys/names. The only thing left to do is to split the SalesValue column to get the correct sales value for each product. There are multiple ways of doing it, depending on the logic you need to apply. If you have products with the same or similar price, you can simply multiply sales value by the quantity share of different products inside the pack.

For example, Pack X contains 3 products with quantities 5,10, and 15. We will divide SalesValue (4000) and PackQTY (30), then multiply by ProductQTY (5,10 or 15). This means that each product gets a value proportional to its quantity share. In our case, this would be wrong because products A, B, C in Pack A all have different prices (10,15 and 20).

This means we need to apply different logic. We want to see what is the value share (instead of quantity share) of each product in our packs. This means that we need to get individual product values (multiply columns ProductQTY and ProductPrice) and divide it by pack value (sum of all individual product values). The easiest way to get the Value share of each product is to duplicate the Pack table (Components list) and do a group by PackKey to receive totals of each pack.

let
    Source = Excel.CurrentWorkbook(){[Name="Table2"]}[Content],
    #"Added Custom" = Table.AddColumn(Source, "ProductValue", each [ProductPrice]*[ProductQTY]), //Get total value of each product
    #"Grouped Rows" = Table.Group(#"Added Custom", {"PackKey"}, {{"TotalValue", each List.Sum([ProductValue]), type number}}), //Sum values on PackKey granularity
    #"Changed Type" = Table.TransformColumnTypes(#"Grouped Rows",{{"PackKey", type text}})
in
    #"Changed Type"



Now we can merge totals into PacksTable to create [ValueShare] column.


When we get this value share in our fact table, we can now use it to split SalesValue column into individual product sales value.

Finishing the Solution (M Code)


Finally, we have the correct sales values. It would also be good to add a column that will determine whether the product is coming from packs or is a regular product. This would be useful to compare previous quantities and values with the new ones. We will add an additional column called PackOrRegular:


After we remove redundant columns and change column types, we get the new fact table:


We can see that we have the correct key, name, quantity, and sales value. As you can see in the comparison below, the sum of old sales value and new sales value needs to be the same, while the new quantity needs to be greater than the old quantity (since 1 pack contains more than 1 product).


The whole M code for this solution:

let
    Source = Excel.CurrentWorkbook(){[Name="FactTable"]}[Content],
    #"Changed Type" = Table.TransformColumnTypes(Source,{{"Date", type date}, {"ProductKey", type text}, {"ProductName", type text}, {"SalesQTY", Int64.Type}, {"SalesValue", type number}}),
    #"Merged Queries" = Table.NestedJoin(#"Changed Type", {"ProductKey"}, PacksTable, {"PackKey"}, "PacksTable", JoinKind.LeftOuter),
    #"Expanded PacksTable" = Table.ExpandTableColumn(#"Merged Queries", "PacksTable", {"PackQTY", "Product", "ProductKey", "ProductQTY", "ValueShare"}, {"PackQTY", "Product", "ProductKey.1", "ProductQTY", "ValueShare"}),
    #"Added Conditional Column" = Table.AddColumn(#"Expanded PacksTable", "NewProductKey", each if [ProductKey.1] = null then [ProductKey] else [ProductKey.1]),
    #"Added Conditional Column1" = Table.AddColumn(#"Added Conditional Column", "NewProductName", each if [Product] = null then [ProductName] else [Product]),
    #"Added Custom" = Table.AddColumn(#"Added Conditional Column1", "NewSalesQTY", each if [ProductKey.1] <> null then [SalesQTY]*[ProductQTY] else [SalesQTY]),
    #"Added Custom1" = Table.AddColumn(#"Added Custom", "NewSalesValue", each if [ProductKey.1] <> null then [SalesValue]*[ValueShare] else [SalesValue]),
    #"Added Conditional Column2" = Table.AddColumn(#"Added Custom1", "PackOrRegular", each if [ProductKey.1] = null then "Regular" else "Pack"),
    #"Removed Other Columns" = Table.SelectColumns(#"Added Conditional Column2",{"Date", "NewProductKey", "NewProductName", "NewSalesQTY", "NewSalesValue","PackOrRegular"}),
    #"Changed Type1" = Table.TransformColumnTypes(#"Removed Other Columns",{{"NewProductKey", type text}, {"NewProductName", type text}, {"NewSalesQTY", Int64.Type}, {"NewSalesValue", type number}, {"PackOrRegular", type text}})
in
    #"Changed Type1"


Thank you for reading!
If you have any comments, please leave them below!

Table of Content

Table of Content

LEARN POWER QUERY

Improve your PowerQuery skills with
Exceed Academy

COMMENTS

Subscribe
Notify of
guest
4 Comments
Newest
Oldest Most Voted
Inline Feedbacks
View all comments

GET LATEST BI NEWS TO YOUR INBOX

Receive the latest updates on all business analyst news across all platforms.

By subscribing you are agreeing to our Privacy Policy.

Related blog posts

DAX HANDBOOK
7. VARIABLES

Variables are used in almost every measure you will create. The reason we introduce them last is that they use all other parts of DAX code to produce faster, more powerful and maintainable code. Variables are like containers of a part of the DAX code which can be used throughout...

Read more

DAX HANDBOOK
6.8 ALLSELECTED

Explanation ALLSELECTED is one of the most complex functions in DAX. When used improperly it can lead to unexpected results. As a rule of thumb, you should not use it in iterative functions. It is preferable to use it only as a CALCULATE filter remover, not as a table function....

Read more

DAX HANDBOOK
6.7 Lineage

What is Lineage? Lineage is a part of DAX mechanics that enables us to use tables as filter arguments for the CALCULATE function. It’s also used in row2filter context transition and other aspects of the data model that involve filter propagation throughout the model. We can state that lineage is...

Read more

DAX HANDBOOK
6.6 Crossfilter

Crossfilter is a feature of DAX when it filters the underlying dataset even though there aren’t any visual filters present. Introduction In this example, we will explain a very important feature of CALCULATE filter arguments. We will also explain why you should always prefer the combination of ALL/REMOVEFILTER + VALUES...

Read more
4
0
Would love your thoughts, please comment.x
()
x
Scroll to Top