Share on facebook
Share on twitter
Share on linkedin

How to process multiple folders with a single PowerQuery script

Krešimir Ledinski

Krešimir Ledinski

Microsoft certified expert in the field of Business Intelligence. His biggest passions are DAX, M, and data modeling.

Share on facebook
Share on twitter
Share on linkedin

Let us imagine the following scenario. We have multiple clients sharing their sales data with us. They store the data in dedicated folders that reside in different places. The number of clients, as well as the number of files for each client, will vary across periods.

We wish to create an automatic refresh system that will work no matter how many files and folders we input. Sound complicated? Sort of, but in this article we will explain many PowerQuery features that can be useful in a variety of situations.

Building blocks of the solution:

  • Custom functions
  • Nested function
  • Excel Parameter table

The idea is to show you how you can create building blocks from PQ and gradually increase complexity without getting lost in code. Therefore, utilizing the PowerQuery full potential!

Processing a whole folder is one of the most powerful features in Power Query. In case you are not familiar with this feature, you can check out this link to find the video in which we explained how to create a custom function and use it upon folder.


The idea

It is a common practice to save all similar files in a single folder and use the Power Query From Folder connector to process all files in a single query. But what if for some reason, we want to get files from different folders and process all of them in a single query? Is it possible to process multiple folders in one query? Yes, it is!

For demonstration purposes, we will use a fictional Toys company based on AdventureWorks demo database. You can download all the source files and the solution on the following link.

In a standard case, which is processing all files from a single folder, we would save the files in the folder as in the picture below:


In our case, we have 3 different folders (Region1,2 and 3), each containing multiple retailer files.


The solution

Lets start building our solution. First, we need to connect to a single .xlsx file and select only one US Country, for example New Jersey.

To clean the selected table, we need to filter out “Grand Total” row and unpivot other columns except [Row Labels].

Later, we need to convert script to a function.

let
    Source = Excel.Workbook(File.Contents("C:\RegionA\SuperToys.xlsx"), null, true),
    #"New Jersey_Sheet" = Source{[Item="New Jersey",Kind="Sheet"]}[Data],
    #"Promoted Headers" = Table.PromoteHeaders(#"New Jersey_Sheet", [PromoteAllScalars=true]),
    #"Filtered Rows" = Table.SelectRows(#"Promoted Headers", each ([Row Labels] <> "Grand Total")),
    #"Unpivoted Other Columns" = Table.UnpivotOtherColumns(#"Filtered Rows", {"Row Labels"}, "Attribute", "Value"),
    #"Renamed Columns" = Table.RenameColumns(#"Unpivoted Other Columns",{{"Row Labels", "City"}, {"Attribute", "Type"}})
in
    #"Renamed Columns"

We should remove the marked rows since they will change depending on the folders and files the function connects to. So, to convert this query to a function, we will remove the top 2 rows and add an input variable above the let keyword.

(InputTable as table)=> //()=> is a syntax for creating an input variable for the function

let
    #"Promoted Headers" = Table.PromoteHeaders(InputTable, [PromoteAllScalars=true]), //in this step the InputTable variable is entering our function
    #"Filtered Rows" = Table.SelectRows(#"Promoted Headers", each ([Row Labels] <> "Grand Total")),
    #"Unpivoted Other Columns" = Table.UnpivotOtherColumns(#"Filtered Rows", {"Row Labels"}, "Attribute", "Value"),
    #"Renamed Columns" = Table.RenameColumns(#"Unpivoted Other Columns",{{"Row Labels", "City"}, {"Attribute", "Type"}})
in
    #"Renamed Columns"

Let’s call this function FXClean. Now, lets build the second layer of the solution that will involve invoking the function upon all the files in a single folder.

We will connect to RegionC folder which hosts 4 different Excel files of the similar structure (4 different retailers).

To clean all the files in this folder, we need to extract all the sheets from each folder (to access all countries retailers operate in) and then invoke FXClean function upon each Country.

Lastly, we need to expand [FXClean] column, and remove other unnecesary columns. Below is the current M code of the RegionC query.

let
    Source = Folder.Files("C:\RegionC"),
    #"Removed Other Columns" = Table.SelectColumns(Source,{"Content", "Name"}),
    #"Added Custom" = Table.AddColumn(#"Removed Other Columns", "ExcelSheets", each Excel.Workbook([Content],false)),
    #"Expanded ExcelSheets" = Table.ExpandTableColumn(#"Added Custom", "ExcelSheets", {"Item","Data" }, {"Item","Data"}),
    #"Extracted Text Before Delimiter" = Table.TransformColumns(#"Expanded ExcelSheets", {{"Name", each Text.BeforeDelimiter(_, "."), type text}}),
    #"Invoked Custom Function" = Table.AddColumn(#"Extracted Text Before Delimiter", "FXClean", each FXClean([Data])),
    #"Expanded FXClean" = Table.ExpandTableColumn(#"Invoked Custom Function", "FXClean", {"City", "Type", "Value"}, {"City", "Type", "Value"}),
    #"Removed Columns" = Table.RemoveColumns(#"Expanded FXClean",{"Content", "Data"})
in
    #"Removed Columns"

The only obstacle so far is that we need to duplicate this query for each Region. Or we can convert this query to a function and provide it with a list of folder paths.

Notice that in the second row we have a hardcoded path to the RegionC folder, so when converting this query to a function, we need to address the input variable to that step.

Also, notice that we are invoking our previously created custom function in the code that is itself becoming a function. this way we created a nested custom function. This is generally a prefered approach since it lowers the complexity of the overall code.

After converting the query above into a function (called FXCleanAllFolders) this is the code we have.

(InputPath as text)=>

let
    Source = Folder.Files(InputPath),  //InputPath is now dynamic and we only need to provide all the paths as a list or a table
    #"Removed Other Columns" = Table.SelectColumns(Source,{"Content", "Name"}),
    #"Added Custom" = Table.AddColumn(#"Removed Other Columns", "ExcelSheets", each Excel.Workbook([Content],false)),
    #"Expanded ExcelSheets" = Table.ExpandTableColumn(#"Added Custom", "ExcelSheets", {"Item","Data" }, {"Item","Data"}),
    #"Extracted Text Before Delimiter" = Table.TransformColumns(#"Expanded ExcelSheets", {{"Name", each Text.BeforeDelimiter(_, "."), type text}}),
    #"Invoked Custom Function" = Table.AddColumn(#"Extracted Text Before Delimiter", "FXClean", each FXClean([Data])), //calling nested function for every outer iteration
    #"Expanded FXClean" = Table.ExpandTableColumn(#"Invoked Custom Function", "FXClean", {"City", "Type", "Value"}, {"City", "Type", "Value"}),
    #"Removed Columns" = Table.RemoveColumns(#"Expanded FXClean",{"Content", "Data"})
in
    #"Removed Columns"


Finishing the solution

Lastly, we need to create a table containing all the folder paths that will be used to create invoked function step upon.

For the purpose, we created one in Excel file.

Finally we can invoke our FXCleanAllFolders with the [Path] column as input.

The M code of the final step is a simple one.

let
    Source = Excel.CurrentWorkbook(){[Name="Regions"]}[Content],
    #"Invoked Custom Function" = Table.AddColumn(Source, "FXCleanAllFolders", each FXCleanAllFolders([Path]))
in
    #"Invoked Custom Function"

Basically, both functions have an easy to follow steps, maintaining the code readability and keeping the low overall complexity. But when combined, they perform complex data manipulations.


Conclusion

In this article, we described a specific problem that occurs when we need to process multiple folders. Instead of processing folders one by one, and doing append afterward, we can use this technique to create a dynamic data cleanup, all in a single query. We used 2 custom functions and a helper table containing folder paths pointing to all folders that need to be processed. The idea of the article was to show you how, by combining multiple functions, you can create powerful Power Query scripts that can be used in many different scenarios.

Hope you enjoyed reading through this article. In case you have any questions, please post them below!

Share on facebook
Share on linkedin
Share on twitter

LEARN POWER QUERY

Learn and improve your PowerQuery skills with
Exceed Academy

COMMENTS

Subscribe
Notify of
guest
0 Comments
Inline Feedbacks
View all comments

GET LATEST BI NEWS TO YOUR INBOX

Receive the latest updates on all business analyst news across all platforms.

By subscribing you are agreeing to our Privacy Policy.

Related blog posts

How to process multiple folders with a single PowerQuery script

Let us imagine the following scenario. We have multiple clients sharing their sales data with us. They store the data in dedicated folders that reside in different places. The number of clients, as well as the number of files for each client, will vary across periods. We wish to create...

Read more

Dynamically expand table or record columns in Power Query

In this article, we will show how to solve a common problem of expanding nested tables or records in Power Query, when the column names of the nested objects vary in numbers. In the second part of the article, we will show a more robust technique for dealing with changing...

Read more

Sorting in Matrix created with calculation groups (PowerBI)

This is the second part of the blog series dedicated to showing workarounds for currently unsupported matrix visual features. In the first part, we explained how to add conditional formatting only for the specified columns, and in this article, we will explain how to introduce sorting. Below, you can test...

Read more
0
Would love your thoughts, please comment.x
()
x
Scroll to Top