DAX HANDBOOK
5.1 Optimal data structures

If you wish to follow along, you can find PBIX/Excel files at the bottom of the article. 

Authors

Krešimir Ledinski

Krešimir Ledinski

Microsoft certified expert in the field of Business Intelligence. His biggest passions are DAX, M, and data modeling.

Kristian Radoš

Kristian Radoš

Experienced data analyst. Advanced in SQL, PowerApps and M language.

This section will explain why you would need a data model (having all the data in a single table is never a good practice), and the differences between dimension and fact tables. We will start this section first by understanding the most important data modeling rules, and those are the distinction between the filter and final values. If you work with structured data sources (database, DWH, etc.) then you will rarely think about these distinctions since you will have data prepared (in most cases) in an optimal way, but in the case of working with unstructured sources (exports from systems, web data, Excel files, etc.) the following few chapters will be of great benefit for you and will keep you from making rookie mistakes in your data models.

Filter Columns

Is the data in the table below in optimal form?

Even though the data is represented in a table form, this table is not in an optimal form. We have 3 columns with descriptive values (Brand, Product, Segment). We also have 3 columns with numbers we wish to analyze. All these 3 columns represent Sold QTY values. Those values are split into 3 columns based on dates. Dates are, by nature, also descriptive values (same as Brand, Product and Segment columns).

The rule is: All descriptive values of same nature (in this case dates) must be in their own single column which has the name most suitable for describing those values.

In our case the table from above should be transformed into the following table:

All the dates are put in a single column, called Date. Now we have 4 columns with descriptive values and one column with numbers we wish to analyze. These 4 columns are called Filter columns. We mostly use filter columns to:

  1. Put them in visuals on columns, rows, axis, filters and other fields which are used to perform filtering of the data
  2. Iterate through them with iterative functions (SUMX, FILTER, etc.)
  3. Use them to return COUNT/DISTINCT number of rows in table
  4. Use them as CALCULATE arguments to modify filter context

Value Columns

Now that we understand filter columns, let’s check if the following structure is optimal.

At the first sight, it seems ok. We have 5 descriptive columns and 1 column with values. But let’s focus on the Facts column. In it, we have 3 different facts (Cost, Price and Sold QTY). Those are not descriptive values of the same nature, but rather completely different descriptions of completely different final values we would want to analyze separately. Focus on the word final. We want to put all the final values of the same nature in their own columns! This table should be transformed into the following structure.

Now we have 4 filter columns and 3 value columns. Value columns consist of values we wish to analyze and in one value column must be only a unique type of value (e.g. Cost, Price, Qty). We mostly use value columns to:

  1. Use them in Measures to produce results we would put in the values field of the visual
  2. Group them to form filter columns

Above are the general rules of using filter and value columns. In Measure you will use both types of columns, but in different parts of measures.

Dimensions vs Fact Tables

So far, we explained 2 main types of columns in a single table. Now let’s explain the 2 main types of tables. When working with Power BI data models, the best approach is to use a star schema. Star schema consists of multiple Dimensional tables linked to one or more Fact tables.

In our example, we have 3 dimensional tables (tables with the prefix “Dim”), and a single fact table called FactSales. Dimension tables are tables that mostly consist of filter columns. They all have one column, called Key column, through which they connect to, and filter the fact table.

Key column is the column that consists of unique values and its granularity is the same as the granularity of the linked column in the FactTable. In database systems, key column is often a unique identifier (guide, string, or unique number combination). In non-database systems (e.g. data coming from Excel files), that key can be any column with unique values. For example, if your model is based on Excel files, and you wish to analyze sales by products and its descriptions (segment, brand, category, etc.), you can use product names as a key column, as long as each product name is unique.
Except Key column, there can be numerous other filter columns or even a few value ones. We can see that in DimProduct table we have 7 additional columns, 6 filter types, and one of the value type ([StandardCost]). Those columns have only one purpose, and that is to further describe the value of the same row in the key column.

If we focus on the first row in the DimProduct table (ProductKey = 603), all other columns further describe that product. So, product 603 has NA color, costs 53.94$, is of category Components and its name is HL Bottom Bracket. If we didn’t have [ProductKey] column, we could also use [ProductName] column as a key column, as long as all the descriptions of the products are unique and will tend to stay unique in the future. The other prerequisite is to also have product names as a column in the fact table. Then we could create a relationship between those 2 columns.

How to Define Dimension Scope?

Dimensions are often defined by business areas. If you sell products, you will have a product dimension with all the descriptions of products. If you sell products to multiple customers, you will have a customer dimension with information about the customer (e.g. address, region, customer status, …).
An optimal number of dimensions depends on the number of business areas you wish to include in the data model.

Where do you define the number of business areas?

You define that number in the fact tables. As opposed to dimension tables, which consist mostly of filter columns, fact tables mostly consist of value columns. Other columns present are foreign key columns, with only a few, if any, filter column types. The number of business areas (Dimensions) is defined by the number of foreign key columns. This is also called the fact table granularity.

Let’s say that we have a sales fact table, and the data in it is on the level of the receipt, meaning it has information about the customer, date of purchase, exact products he/she purchased, the person who sold it, and store the purchase was made in. From this fact table, we can read that we would need to make 5 different dimensions. Each dimension could have multiple descriptive columns
• DimCustomer – (key column could be CustomerID), description columns: Customer type, Age, Gender, Location, etc.
• DimDate – (key column Date), description columns: Year, Month, Quarter, Weekday, etc.
• DimStore – (key columns StoreID), description columns: Location, Region, Size of store, Store manager, etc.
• DimProduct – (key column ProductID or ProductName or EAN code), description columns: Name, Brand, Segment, Type, etc.
• DimEmployee – (key column EmployeeID), description columns: Age, Gender, Position, Wage, etc.

Let’s emphasize this sentence, which is true in most cases:
The granularity of the fact table defines the number of dimensions!

In our example we have 5 key columns in the fact table, then why do we have only, 3 dimensions?

[ProductKey] and [CustomerKey] columns are used as foreign key columns for DimProduct and DimCustomer. OrderDate and ShipDate are both the same type of dimension called DimDate. [SalesOrderNumber] column can be used as a dimension, but we decided not to go into such depth of analysis. In that case, this column can be used as a value type column, and can give us important information such as the number of orders or maximum sales per order.


What are the differences between key columns in the fact table compared to a key column in dimension?
In dimensions, key columns should always have unique values in the key column, while in the fact tables this is not the case. We can see in the picture above that customer keys are appearing more than once. Actually, it is being repeated for each product sold within the same sales order. Multiple key values can occur in every key in the fact table, that’s why we call that column a foreign key, or key on the many side of the table.

Why Use Dimensions Instead of a Single Big Table?


There are many reasons in favor of using star schema. The most important ones are:

  1. DAX code is easier to write and debug
  2. It’s easier to depict which columns belong to which area of business
  3. Time intelligence does not work if the model does not contain a dedicated Date table with all the dates supplied from the 1st of January to the 31st of December.
  4. You can filter multiple fact tables simultaneously (through dimensions)
  5. Advanced DAX measures might produce wrong results if used without dimensions.

Let’s explain the 4th point (the 3rd one will be explained in the Time intelligence chapter).
We will introduce another fact table, called Targets. Table “Targets” is on the granularity of product, date, and customer. This makes it an ideal candidate to connect to our already existing dimensions. This way we can filter both tables with filters coming from dimension tables.

Columns from dimensions DimProduct, DimCustomer and DimDate can simultaneously filter both FactSales and Forecast tables. Prerequisite for dimension being able to filter fact tables is to have an active relationship formed between those tables. When plotting columns on visuals, you should always use columns coming from dimension tables.

We could try to merge data from Targets into FactSales, but since sales are on the Year-Month granularity in the Forecast table (as opposed to days granularity in the FactSales table), this would pose a modeling challenge on how to merge the data in a correct way.
If we observe the relationship between DimDate and Forecast table, we can see that it is a type of many-many. We should tend to use M2M relationships as few as possible, and only in case we have a full understanding of its possible implications on the calculations. Since we brought up relationships, let’s start this topic with an explanation of the types of relationships we can use in a model.

Materials

We wish to create the best possible content!

If you are a novice looking for a better explanation of any element of the topic, feel free to comment on the part you didn't quite understand!

If you are an expert in DAX and believe certain topic lacks important internals, your comments are more than welcomed!

COMMENTS

Subscribe
Notify of
guest
0 Comments
Inline Feedbacks
View all comments

OUR SERVICES

Prefer live training or consultations?

Table of Content

Table of Content

GET LATEST BI NEWS TO YOUR INBOX

Receive the latest updates on all business analyst news across all platforms.

By subscribing you are agreeing to our Privacy Policy.

Related blog posts

DAX HANDBOOK
7. VARIABLES

Variables are used in almost every measure you will create. The reason we introduce them last is that they use all other parts of DAX code to produce faster, more powerful and maintainable code. Variables are like containers of a part of the DAX code which can be used throughout...

Read more

DAX HANDBOOK
6.8 ALLSELECTED

Explanation ALLSELECTED is one of the most complex functions in DAX. When used improperly it can lead to unexpected results. As a rule of thumb, you should not use it in iterative functions. It is preferable to use it only as a CALCULATE filter remover, not as a table function....

Read more

DAX HANDBOOK
6.7 Lineage

What is Lineage? Lineage is a part of DAX mechanics that enables us to use tables as filter arguments for the CALCULATE function. It’s also used in row2filter context transition and other aspects of the data model that involve filter propagation throughout the model. We can state that lineage is...

Read more

DAX HANDBOOK
6.6 Crossfilter

Crossfilter is a feature of DAX when it filters the underlying dataset even though there aren’t any visual filters present. Introduction In this example, we will explain a very important feature of CALCULATE filter arguments. We will also explain why you should always prefer the combination of ALL/REMOVEFILTER + VALUES...

Read more
0
Would love your thoughts, please comment.x
()
x
Scroll to Top