Let us imagine the following scenario. We have multiple clients sharing their sales data with us. They store the data in dedicated folders that reside in different places. The number of clients, as well as the number of files for each client, will vary across periods.
We wish to create an automatic refresh system that will work no matter how many files and folders we input. Sound complicated? Sort of, but in this article, we will explain many PowerQuery features that can be useful in a variety of situations.
Building blocks of the solution:
- Custom functions
- Nested function
- Excel Parameter table
The idea is to show you how you can create building blocks from PQ and gradually increase complexity without getting lost in code. Therefore, utilizing the PowerQuery full potential!
Processing a whole folder is one of the most powerful features of Power Query. In case you are not familiar with this feature, you can check out this link to find the video in which we explained how to create a custom function and use it upon folder.
The idea
It is a common practice to save all similar files in a single folder and use the Power Query From Folder connector to process all files in a single query. But what if for some reason, we want to get files from different folders and process all of them in a single query? Is it possible to process multiple folders in one query? Yes, it is!
For demonstration purposes, we will use a fictional Toys company based on AdventureWorks demo database. You can download all the source files and the solution on the following link.
In a standard case, which is processing all files from a single folder, we would save the files in the folder as in the picture below:
In our case, we have 3 different folders (Region1,2 and 3), each containing multiple retailer files.
Creating a Function
Lets start building our solution. First, we need to connect to a single .xlsx file and select only one US Country, for example New Jersey.
Later, we need to convert script to a function.
let Source = Excel.Workbook(File.Contents("C:\RegionA\SuperToys.xlsx"), null, true), #"New Jersey_Sheet" = Source{[Item="New Jersey",Kind="Sheet"]}[Data], #"Promoted Headers" = Table.PromoteHeaders(#"New Jersey_Sheet", [PromoteAllScalars=true]), #"Filtered Rows" = Table.SelectRows(#"Promoted Headers", each ([Row Labels] <> "Grand Total")), #"Unpivoted Other Columns" = Table.UnpivotOtherColumns(#"Filtered Rows", {"Row Labels"}, "Attribute", "Value"), #"Renamed Columns" = Table.RenameColumns(#"Unpivoted Other Columns",{{"Row Labels", "City"}, {"Attribute", "Type"}}) in #"Renamed Columns"
We should remove the marked rows since they will change depending on the folders and files the function connects to. So, to convert this query to a function, we will remove the top 2 rows and add an input variable above the let keyword.
(InputTable as table)=> //()=> is a syntax for creating an input variable for the function let #"Promoted Headers" = Table.PromoteHeaders(InputTable, [PromoteAllScalars=true]), //in this step the InputTable variable is entering our function #"Filtered Rows" = Table.SelectRows(#"Promoted Headers", each ([Row Labels] <> "Grand Total")), #"Unpivoted Other Columns" = Table.UnpivotOtherColumns(#"Filtered Rows", {"Row Labels"}, "Attribute", "Value"), #"Renamed Columns" = Table.RenameColumns(#"Unpivoted Other Columns",{{"Row Labels", "City"}, {"Attribute", "Type"}}) in #"Renamed Columns"
Let’s call this function FXClean. Now, let’s build the second layer of the solution that will involve invoking the function upon all the files in a single folder.
Invoking the Function
We will connect to RegionC folder which hosts 4 different Excel files of the similar structure (4 different retailers).
To clean all the files in this folder, we need to extract all the sheets from each folder (to access all countries retailers operate in) and then invoke FXClean function upon each Country.
Lastly, we need to expand [FXClean] column, and remove other unnecesary columns. Below is the current M code of the RegionC query.
let Source = Folder.Files("C:\RegionC"), #"Removed Other Columns" = Table.SelectColumns(Source,{"Content", "Name"}), #"Added Custom" = Table.AddColumn(#"Removed Other Columns", "ExcelSheets", each Excel.Workbook([Content],false)), #"Expanded ExcelSheets" = Table.ExpandTableColumn(#"Added Custom", "ExcelSheets", {"Item","Data" }, {"Item","Data"}), #"Extracted Text Before Delimiter" = Table.TransformColumns(#"Expanded ExcelSheets", {{"Name", each Text.BeforeDelimiter(_, "."), type text}}), #"Invoked Custom Function" = Table.AddColumn(#"Extracted Text Before Delimiter", "FXClean", each FXClean([Data])), #"Expanded FXClean" = Table.ExpandTableColumn(#"Invoked Custom Function", "FXClean", {"City", "Type", "Value"}, {"City", "Type", "Value"}), #"Removed Columns" = Table.RemoveColumns(#"Expanded FXClean",{"Content", "Data"}) in #"Removed Columns"
Nested PowerQuery Functions
The only obstacle so far is that we need to duplicate this query for each Region. Or we can convert this query to a function and provide it with a list of folder paths.
Notice that in the second row we have a hardcoded path to the RegionC folder, so when converting this query to a function, we need to address the input variable to that step.
Also, notice that we are invoking our previously created custom function in the code that is itself becoming a function. this way we created a nested custom function. This is generally a prefered approach since it lowers the complexity of the overall code.
After converting the query above into a function (called FXCleanAllFolders) this is the code we have.
(InputPath as text)=> let Source = Folder.Files(InputPath), //InputPath is now dynamic and we only need to provide all the paths as a list or a table #"Removed Other Columns" = Table.SelectColumns(Source,{"Content", "Name"}), #"Added Custom" = Table.AddColumn(#"Removed Other Columns", "ExcelSheets", each Excel.Workbook([Content],false)), #"Expanded ExcelSheets" = Table.ExpandTableColumn(#"Added Custom", "ExcelSheets", {"Item","Data" }, {"Item","Data"}), #"Extracted Text Before Delimiter" = Table.TransformColumns(#"Expanded ExcelSheets", {{"Name", each Text.BeforeDelimiter(_, "."), type text}}), #"Invoked Custom Function" = Table.AddColumn(#"Extracted Text Before Delimiter", "FXClean", each FXClean([Data])), //calling nested function for every outer iteration #"Expanded FXClean" = Table.ExpandTableColumn(#"Invoked Custom Function", "FXClean", {"City", "Type", "Value"}, {"City", "Type", "Value"}), #"Removed Columns" = Table.RemoveColumns(#"Expanded FXClean",{"Content", "Data"}) in #"Removed Columns"
Finishing the solution
Lastly, we need to create a table containing all the folder paths that will be used to create invoked function step upon.
For the purpose, we created one in Excel file.
Finally we can invoke our FXCleanAllFolders with the [Path] column as input.
The M code of the final step is a simple one.
let Source = Excel.CurrentWorkbook(){[Name="Regions"]}[Content], #"Invoked Custom Function" = Table.AddColumn(Source, "FXCleanAllFolders", each FXCleanAllFolders([Path])) in #"Invoked Custom Function"
Basically, both functions have an easy to follow steps, maintaining the code readability and keeping the low overall complexity. But when combined, they perform complex data manipulations.
Conclusion
In this article, we described a specific problem that occurs when we need to process multiple folders. Instead of processing folders one by one, and doing append afterward, we can use this technique to create a dynamic data cleanup, all in a single query. We used 2 custom functions and a helper table containing folder paths pointing to all folders that need to be processed. The idea of the article was to show you how, by combining multiple functions, you can create powerful Power Query scripts that can be used in many different scenarios.
Hope you enjoyed reading through this article. In case you have any questions, please post them below!