List.Generate() and Looping in PowerQuery

In this article, we will explain an advanced topic about the M language – using List.Generate to loop over a table. We will combine our knowledge of lists, records, functions, iterations, and looping techniques. Basically, every important object in PowerQuery.


Do-while Loop in PowerQuery

Do while loop is commonly used in object programming. The idea is to create a loop that is being executed until it reaches a certain condition. The following is a simple VBA code representation.

X=1
Do while X < 10
X=X+1
Range(“A” & X ).Value = X
Loop

If we execute this code in Excel, we will receive the following list of values. The code iterated through X values until it reached value 10. In that step, the condition in do-while row evaluated to false therefore aborting further iterations.

While looping is used regularly in object programming, in functional programming (which M is a part of), looping is not so easy to implement nor to understand. Nevertheless, there are techniques to implement recursive or looping functions, and using a List.Generate() function is one of them.

Note to mention: these techniques, due to their iterative nature, tend to become very slow on large datasets or many iterations. Since PowerQuery is not optimized for those operations, you should rely on them only as a last available option.

That does not mean you should never use them, because they are extremely powerful if used properly.


How to Create Looping in PowerQuery and General Use


The general use of these functions is in dynamic outputs, where we do not know how many times our function should iterate to collect all the data.

Use cases:

  • API-s with previous/next page structure,
  • API-s with a limited number of concurrent calls,
  • Web pages with pagination,
  • Other structures in which we are unaware of the definite number of iterations.

Mostly used techniques to create iterations are:

  • Recursive technique
  • List.Generate() function
  • List.Accumulate() function


Recursive Technique


The recursive technique is using @ sign to access the same function from itself until it reaches a certain criterion. A simple representation is given below.
The function is receiving only one argument, X, and is calling itself in the else argument of if statement (the @Function keyword).


If we feed this function with a number 1, it will produce 4 iterations.

1st iteration

  • X that is entering function=1 which, is lower than 10, so the function will start again, receiving number 1 and multiplying it by 3. (row with else keyword)

2nd iteration

  • X that is entering function =3, which is lower than 10, so the function will start again, receiving number 3 and multiplying it by 3. (row with else keyword)

3rd iteration

  • X that is entering function =9, which is lower than 10, so the function will start again, receiving number 9 and multiplying it by 3. (row with else keyword)

4th iteration

  • X that is entering function =27 which is higher than 10. At this step the condition is evaluated to true and the X (27) is returned as an output of the function. (then X row)

This is how we can introduce recursion manually with an @ sign. There are also certain M functions that can give us a framework for looping. Those are List.Accumulate() and List.Generate().


List.Generate()


In this topic we will talk about List.Generate(). First, let’s see what are the advantages of using this function against the classic recursion with @ sign.

  1. Better readability and given framework
  2. Better performance
  3. Unlimited call stacks

What does it mean to have unlimited call stacks?

While doing recursions, each additional iteration is added to the total stack of all iterations. This stack cannot grow unlimited in size, so at a certain point, the stack will become full and the procedure will stop. This is what can happen with classic @ recursion, but with List.Generate() you are only limited by your computer memory. So, with List.Generate() you do not need to worry about the call stacks which can potentially stop the execution of your query.


Business Case Used to Explain the Function


We will explain this function on a simple example with an Excel file (you can also find the final Excel solution here). In it, we have multiple sheets, each with the structure like in the picture below.

We want to transpose values on each sheet and append them one below the other. This Excel will grow in size and new sheets will be added in appropriate order form 1 to x. We want to iterate through those sheets and apply our function until we reach the non-existing sheet in the Excel file (in our example that would be sheet5).

This task is only for demonstration purposes and I know that there are many better ways of handling this query. But once you understand this function thoroughly you can use it to solve real business cases mentioned at the begging of the article.

This is in general a great example of M coding, since in it we will use both list and record types of M objects, and each keyword, which is a sugar syntax for a simple function which accepts the current row as an argument! We also have an iteration, so with this example we will cover most of the advanced M coding!

In case you feel insecure about using records, lists and each keyword, please revise those fields before continuing with this lecture.


Building Blocks of the Final Solution


Now we will go through all the objects we will use to form the final solution with List.Generate() function.

Function Used to Transform Each Excel Sheet

We will use the following function to unpivot each sheet in a workbook.

(InputNumber)=>
let
Source = Excel.Workbook(File.Contents("C:\Generate\Source.xlsx"), null, true), //Fixed path to the file
SheetToTransform = Source{[Item="Sheet"& Number.ToText(InputNumber),Kind="Sheet"]}[Data], //InputNumber variable will concatenate with text Sheet to form the Sheet reference
#"Transposed Table" = Table.Transpose(SheetToTransform) //Table transposing
in
#"Transposed Table"

At the moment, we should focus solely on the SheetToTransform step, in which we are injecting the InputNumber variable to perform concatenation with the Sheet keyword. On each iteration, this number will increment by 1. The First 3 iterations are presented below.

  1. SheetToTransform = Source{[Item=”Sheet1”,Kind=”Sheet”]}[Data],
  2. SheetToTransform = Source{[Item=”Sheet2”,Kind=”Sheet”]}[Data],
  3. SheetToTransform = Source{[Item=”Sheet3”,Kind=”Sheet”]}[Data],
  4. ….

The idea is to iterate this function until it reaches a non-existing sheet. In our example that would be Sheet5. After it reaches Sheet5, the function should evaluate to false, resulting in exiting from the iterations and retrieving all the tables from previous successful steps of execution.

This is where List.Generate() comes in as a building block for this solution.


List.Generate() Overview


First, let’s get familiar with the syntax of the function.

List.Generate(
initial, Starting position of the function
condition, Condition evaluated for each iteration
next, Next row in a list
optional selector) if the condition is evaluated to true, take that result and do something additional with it.

For me, it was hard to grasp what each of these parameters means until I started to think in the M row context. The most important part to understand is that the List.Generate() is an iterative function, meaning that it produces a so-called row context. In that row context, in each iteration, there are always 2 rows present, the current row and the next row.
We will demonstrate this with a simple List.Generate() function.

List.Generate(
()=>1,
each _<7, each _+2,
each _*_
)

If we run an expression like the one above, we will receive the following list. What follows is the explanation of the execution steps.


()=>1

This is the initial state before the function starts iterating and creating a list. ()=> is a syntax of a function definition, meaning that List.Generate() only accepts functions as parameters. 1 is the starting input that will change in each iteration based on the 3rd argument. Besides scalars like numbers or text, starting input can also be in the form of structured objects, like tables, lists, or records (more on that in the following example). The result of this argument (in the form of a current row) is going to be used within all the other arguments of the List.Generate() function. We will access this value with the underscore (_) symbol.

each _<7,

as we can see, the keyword each is present in all other parameters of the function. Each is a sugar syntax for an unnamed variable that is holding the current row in an iteration. The following line could be written as
(_)=> _<7
Where underscore represents the row from the current iteration of the list (the initial state).
This argument is taking the initial state of the current iteration, and then does the comparison with the condition, if the condition is true, it stores that result in the current row of the list or pushes it to the optional 4th argument for further processing. If the condition evaluates to false, then the function execution stops and returns everything that was stored up until the last iteration that resulted in false.

each _+2,

This is where it starts to become interesting. This argument is creating an additional (next) row in the list, then pushes the result of its evaluation as the 1st argument in the next row iteration. At the moment the result is pushed to the 1st argument, the iteration starts again and the condition part kicks in checking how the next row result evaluates against the condition. In this process, looping occurs until the condition is evaluated to false.

each _*_

(Optional argument)
If the condition in the 2nd argument of the current iteration is evaluated to true, that result is pushed to the last optional argument for further processing. In case we omit the last argument, then the result of the 2nd argument is left as is in the current row of the list.

I’m not sure when PowerQuery engine internally evaluates the 4th argument (inside each iteration or after the list is created), but I believe this is irrelevant to understanding the function principles.

The order of the execution is as follows.

If we apply that to our list, this is what is happening in each iteration.
We replaced _ (underscore) with red numbers to emphasize what is happening in each iteration.

  • As we can see, the initial state (starting position) is always entering the functions and remains stable for the duration of the current iteration.
  • For the first iteration, the starting position is the one that we explicitly gave as the first argument of the List.Generate() function.
  • In the second iterations, the initial state is altered by the next row expression (3rd argument), then the loop starts again.
  • As the function is looping, it is generating the List output as shown on the right side of the picture.
  • In the last iteration, the starting position becomes 7, which in the second argument evaluates to false, therefore aborting further iterations and returning the list as it was in the previous step of execution.


List.Generate() Implementation With an Excel File


Now that we have become acquainted with the List.Generate() function, let’s see how we can utilize it to perform the task from the beginning of the article.

So, we have an unknown number of sheets that we need to transform and we wish to use the List.Generate() function to store all transformed sheets until we hit a non-existing one. After that, the function should stop and return the transformed tables in the form of nested tables inside of a list.

We know the function we will use from the beginning of the article.

(InputNumber)=>
let
Source = Excel.Workbook(File.Contents("C:\Generate\Source.xlsx"), null, true), //Fixed path to the file
SheetToTransform = Source{[Item="Sheet"& Number.ToText(InputNumber),Kind="Sheet"]}[Data], //InputNumber variable will concatenate with text Sheet to form the Sheet reference
#"Transposed Table" = Table.Transpose(SheetToTransform) //Table transposing
in
#"Transposed Table"

Now, let’s observe how to implement the solution to dynamic number of sheets with List.Generate().


M Code of the solution

let
Source =List.Generate(
()=>[Sheet=1,Funct=FXSheet(1)],
each (try [Funct])[HasError]=false,
each [Sheet=[Sheet]+1,Funct=FXSheet([Sheet]+1)]),
#"Converted to Table" = Table.FromList(Source, Splitter.SplitByNothing(), null, null, ExtraValues.Error),
#"Expanded Column1" = Table.ExpandRecordColumn(#"Converted to Table", "Column1", {"Sheet", "Funct"}, {"Sheet", "Funct"}),
#"Expanded Funct" = Table.ExpandTableColumn(#"Expanded Column1", "Funct", {"Column1", "Column2"}, {"Column1", "Column2"})
in
#"Expanded Funct"

Let’s focus on the first step in the query which consists of the List.Generate() function. Now, let’s explain each argument of the function.

()=>[Sheet=1,Funct=FXSheet(1)]

As we mentioned earlier, initial state can be any scalar value, or structured type of object like tables, records or lists. In our case, we are using the square brackets syntax to create a record type object holding 2 columns named Sheet and Funct. Sheet column has the starting value of 1, and the Funct column starting value is a function invoke with an argument of 1. FXSheet is the name of our manually created function, as shown in the picture below.

each (try [Funct])[HasError]=false

This step is taking the record from the initial state and checking if the FXSheet() function that is stored in a column named [Funct] is returning an error in the current iteration step. If the function is returning an error then stop the execution, else continue with the next argument.

each [Sheet=[Sheet]+1,Funct=FXSheet([Sheet]+1)])

If the condition (try [Funct])[HasError]=false is evaluated to true, then add a new row to the list, and set the initial state of that row to the following expression.
We again create a record type of object. For sheet column, we set new initial state to be the current row [Sheet] column value + 1, and for the Funct column we set new initial state to be FXSheet([Sheet]+1). So, the initial states will change with each iteration as follows.

Iteration Initial step

  1. [Sheet=1,Funct=FXSheet(1)]
  2. [Sheet=2,Funct=FXSheet(2)]
  3. [Sheet=3,Funct=FXSheet(3)]
  4. [Sheet=4,Funct=FXSheet(4)]
  5. [Sheet=5,Funct=FXSheet(5)]

At the 5th iteration, the condition will evaluate to false (since in our Excel file we only have sheets up to sheet 4), so the function will exit and we will receive a list containing nested records.


Each record will have 2 columns, column Sheet with the sheet number, and column Funct with nested table as the result of the invoked function over the current row.

After we expand the record, we can see that each row has its transposed table as a result of invoking FXSheet() function.




Using the Forth Argument to Further Transform Results of the Function


In this example we omitted the last argument, therefore returning the whole record of the current row after it evaluated to true. In case we wanted to further process the result of the evaluation, we could use the 4th argument.

For example, if we needed only column1 from nested tables, and we also wanted an additional column with the sheet number multiplied with an outer variable, we could write the 4th argument as follows.

let
Source =List.Generate(()=>[Sheet=1,Funct=FXSheet(1)],
each (try [Funct])[HasError]=false,
each [Sheet=[Sheet]+1,Funct=FXSheet([Sheet]+1)],
each [NestedList=[Funct][Column1],Multiplication=[Sheet]*OuterVariable]),
#"Converted to Table" = Table.FromList(Source, Splitter.SplitByNothing(), null, null, ExtraValues.Error)
In
#"Converted to Table"

As a result, we will receive a list holding nested records. Each record is holding 2 columns named NestedList and Multiplication.


NestedList is taking just Column1 from the table output, while the Multiplication column is combining the result of iteration with the OuterVariable (yes, that is possible!) to produce the final output. The last one is bringing even more juice to the function, so the applications and business problems you can solve with the use of List.Generate() are numerous.

Just remember that you must use it with caution since it tends to become slow if used incorrectly. In certain situations (like with nested List.Generate()) you can utilize the List.Buffer() function to significantly increase the overall execution speed, but that won’t help much in case there are too many iterations or the dataset is simply to complex. And try to avoid nested List.Generate() if possible.

In case you have any questions, please feel free to post them down below!

Table of Content

Table of Content

LEARN POWER QUERY

Improve your PowerQuery skills with
Exceed Academy

COMMENTS

Subscribe
Notify of
guest
18 Comments
Newest
Oldest Most Voted
Inline Feedbacks
View all comments

GET LATEST BI NEWS TO YOUR INBOX

Receive the latest updates on all business analyst news across all platforms.

By subscribing you are agreeing to our Privacy Policy.

Related blog posts

DAX HANDBOOK
7. VARIABLES

Variables are used in almost every measure you will create. The reason we introduce them last is that they use all other parts of DAX code to produce faster, more powerful and maintainable code. Variables are like containers of a part of the DAX code which can be used throughout...

Read more

DAX HANDBOOK
6.8 ALLSELECTED

Explanation ALLSELECTED is one of the most complex functions in DAX. When used improperly it can lead to unexpected results. As a rule of thumb, you should not use it in iterative functions. It is preferable to use it only as a CALCULATE filter remover, not as a table function....

Read more

DAX HANDBOOK
6.7 Lineage

What is Lineage? Lineage is a part of DAX mechanics that enables us to use tables as filter arguments for the CALCULATE function. It’s also used in row2filter context transition and other aspects of the data model that involve filter propagation throughout the model. We can state that lineage is...

Read more

DAX HANDBOOK
6.6 Crossfilter

Crossfilter is a feature of DAX when it filters the underlying dataset even though there aren’t any visual filters present. Introduction In this example, we will explain a very important feature of CALCULATE filter arguments. We will also explain why you should always prefer the combination of ALL/REMOVEFILTER + VALUES...

Read more
18
0
Would love your thoughts, please comment.x
()
x
Scroll to Top