How to process multiple folders with a single PowerQuery script

Let us imagine the following scenario. We have multiple clients sharing their sales data with us. They store the data in dedicated folders that reside in different places. The number of clients, as well as the number of files for each client, will vary across periods.

We wish to create an automatic refresh system that will work no matter how many files and folders we input. Sound complicated? Sort of, but in this article, we will explain many PowerQuery features that can be useful in a variety of situations.

Building blocks of the solution:

  • Custom functions
  • Nested function
  • Excel Parameter table

The idea is to show you how you can create building blocks from PQ and gradually increase complexity without getting lost in code. Therefore, utilizing the PowerQuery full potential!

Processing a whole folder is one of the most powerful features of Power Query. In case you are not familiar with this feature, you can check out this link to find the video in which we explained how to create a custom function and use it upon folder.


The idea

It is a common practice to save all similar files in a single folder and use the Power Query From Folder connector to process all files in a single query. But what if for some reason, we want to get files from different folders and process all of them in a single query? Is it possible to process multiple folders in one query? Yes, it is!

For demonstration purposes, we will use a fictional Toys company based on AdventureWorks demo database. You can download all the source files and the solution on the following link.

In a standard case, which is processing all files from a single folder, we would save the files in the folder as in the picture below:


In our case, we have 3 different folders (Region1,2 and 3), each containing multiple retailer files.


Creating a Function

Lets start building our solution. First, we need to connect to a single .xlsx file and select only one US Country, for example New Jersey.

To clean the selected table, we need to filter out “Grand Total” row and unpivot other columns except [Row Labels].

Later, we need to convert script to a function.

let
    Source = Excel.Workbook(File.Contents("C:\RegionA\SuperToys.xlsx"), null, true),
    #"New Jersey_Sheet" = Source{[Item="New Jersey",Kind="Sheet"]}[Data],
    #"Promoted Headers" = Table.PromoteHeaders(#"New Jersey_Sheet", [PromoteAllScalars=true]),
    #"Filtered Rows" = Table.SelectRows(#"Promoted Headers", each ([Row Labels] <> "Grand Total")),
    #"Unpivoted Other Columns" = Table.UnpivotOtherColumns(#"Filtered Rows", {"Row Labels"}, "Attribute", "Value"),
    #"Renamed Columns" = Table.RenameColumns(#"Unpivoted Other Columns",{{"Row Labels", "City"}, {"Attribute", "Type"}})
in
    #"Renamed Columns"

We should remove the marked rows since they will change depending on the folders and files the function connects to. So, to convert this query to a function, we will remove the top 2 rows and add an input variable above the let keyword.

(InputTable as table)=> //()=> is a syntax for creating an input variable for the function

let
    #"Promoted Headers" = Table.PromoteHeaders(InputTable, [PromoteAllScalars=true]), //in this step the InputTable variable is entering our function
    #"Filtered Rows" = Table.SelectRows(#"Promoted Headers", each ([Row Labels] <> "Grand Total")),
    #"Unpivoted Other Columns" = Table.UnpivotOtherColumns(#"Filtered Rows", {"Row Labels"}, "Attribute", "Value"),
    #"Renamed Columns" = Table.RenameColumns(#"Unpivoted Other Columns",{{"Row Labels", "City"}, {"Attribute", "Type"}})
in
    #"Renamed Columns"

Let’s call this function FXClean. Now, let’s build the second layer of the solution that will involve invoking the function upon all the files in a single folder.

Invoking the Function

We will connect to RegionC folder which hosts 4 different Excel files of the similar structure (4 different retailers).

To clean all the files in this folder, we need to extract all the sheets from each folder (to access all countries retailers operate in) and then invoke FXClean function upon each Country.

Lastly, we need to expand [FXClean] column, and remove other unnecesary columns. Below is the current M code of the RegionC query.

let
    Source = Folder.Files("C:\RegionC"),
    #"Removed Other Columns" = Table.SelectColumns(Source,{"Content", "Name"}),
    #"Added Custom" = Table.AddColumn(#"Removed Other Columns", "ExcelSheets", each Excel.Workbook([Content],false)),
    #"Expanded ExcelSheets" = Table.ExpandTableColumn(#"Added Custom", "ExcelSheets", {"Item","Data" }, {"Item","Data"}),
    #"Extracted Text Before Delimiter" = Table.TransformColumns(#"Expanded ExcelSheets", {{"Name", each Text.BeforeDelimiter(_, "."), type text}}),
    #"Invoked Custom Function" = Table.AddColumn(#"Extracted Text Before Delimiter", "FXClean", each FXClean([Data])),
    #"Expanded FXClean" = Table.ExpandTableColumn(#"Invoked Custom Function", "FXClean", {"City", "Type", "Value"}, {"City", "Type", "Value"}),
    #"Removed Columns" = Table.RemoveColumns(#"Expanded FXClean",{"Content", "Data"})
in
    #"Removed Columns"

Nested PowerQuery Functions

The only obstacle so far is that we need to duplicate this query for each Region. Or we can convert this query to a function and provide it with a list of folder paths.

Notice that in the second row we have a hardcoded path to the RegionC folder, so when converting this query to a function, we need to address the input variable to that step.

Also, notice that we are invoking our previously created custom function in the code that is itself becoming a function. this way we created a nested custom function. This is generally a prefered approach since it lowers the complexity of the overall code.

After converting the query above into a function (called FXCleanAllFolders) this is the code we have.

(InputPath as text)=>

let
    Source = Folder.Files(InputPath),  //InputPath is now dynamic and we only need to provide all the paths as a list or a table
    #"Removed Other Columns" = Table.SelectColumns(Source,{"Content", "Name"}),
    #"Added Custom" = Table.AddColumn(#"Removed Other Columns", "ExcelSheets", each Excel.Workbook([Content],false)),
    #"Expanded ExcelSheets" = Table.ExpandTableColumn(#"Added Custom", "ExcelSheets", {"Item","Data" }, {"Item","Data"}),
    #"Extracted Text Before Delimiter" = Table.TransformColumns(#"Expanded ExcelSheets", {{"Name", each Text.BeforeDelimiter(_, "."), type text}}),
    #"Invoked Custom Function" = Table.AddColumn(#"Extracted Text Before Delimiter", "FXClean", each FXClean([Data])), //calling nested function for every outer iteration
    #"Expanded FXClean" = Table.ExpandTableColumn(#"Invoked Custom Function", "FXClean", {"City", "Type", "Value"}, {"City", "Type", "Value"}),
    #"Removed Columns" = Table.RemoveColumns(#"Expanded FXClean",{"Content", "Data"})
in
    #"Removed Columns"


Finishing the solution

Lastly, we need to create a table containing all the folder paths that will be used to create invoked function step upon.

For the purpose, we created one in Excel file.

Finally we can invoke our FXCleanAllFolders with the [Path] column as input.

The M code of the final step is a simple one.

let
    Source = Excel.CurrentWorkbook(){[Name="Regions"]}[Content],
    #"Invoked Custom Function" = Table.AddColumn(Source, "FXCleanAllFolders", each FXCleanAllFolders([Path]))
in
    #"Invoked Custom Function"

Basically, both functions have an easy to follow steps, maintaining the code readability and keeping the low overall complexity. But when combined, they perform complex data manipulations.


Conclusion

In this article, we described a specific problem that occurs when we need to process multiple folders. Instead of processing folders one by one, and doing append afterward, we can use this technique to create a dynamic data cleanup, all in a single query. We used 2 custom functions and a helper table containing folder paths pointing to all folders that need to be processed. The idea of the article was to show you how, by combining multiple functions, you can create powerful Power Query scripts that can be used in many different scenarios.

Hope you enjoyed reading through this article. In case you have any questions, please post them below!

Table of Content

Table of Content

LEARN POWER QUERY

Improve your PowerQuery skills with
Exceed Academy

COMMENTS

Subscribe
Notify of
guest
3 Comments
Newest
Oldest Most Voted
Inline Feedbacks
View all comments

GET LATEST BI NEWS TO YOUR INBOX

Receive the latest updates on all business analyst news across all platforms.

By subscribing you are agreeing to our Privacy Policy.

Related blog posts

DAX HANDBOOK
7. VARIABLES

Variables are used in almost every measure you will create. The reason we introduce them last is that they use all other parts of DAX code to produce faster, more powerful and maintainable code. Variables are like containers of a part of the DAX code which can be used throughout...

Read more

DAX HANDBOOK
6.8 ALLSELECTED

Explanation ALLSELECTED is one of the most complex functions in DAX. When used improperly it can lead to unexpected results. As a rule of thumb, you should not use it in iterative functions. It is preferable to use it only as a CALCULATE filter remover, not as a table function....

Read more

DAX HANDBOOK
6.7 Lineage

What is Lineage? Lineage is a part of DAX mechanics that enables us to use tables as filter arguments for the CALCULATE function. It’s also used in row2filter context transition and other aspects of the data model that involve filter propagation throughout the model. We can state that lineage is...

Read more

DAX HANDBOOK
6.6 Crossfilter

Crossfilter is a feature of DAX when it filters the underlying dataset even though there aren’t any visual filters present. Introduction In this example, we will explain a very important feature of CALCULATE filter arguments. We will also explain why you should always prefer the combination of ALL/REMOVEFILTER + VALUES...

Read more
3
0
Would love your thoughts, please comment.x
()
x
Scroll to Top