This will become apparent quite quickly if your idea of a data model is just one super wide table. The concept of a data model is something that does take some getting used to, I still sometimes manage to get confused by it. A data model is a way of organizing your data so that, as your data set continues to grow over time, little-to-no manual actions need to be taken by the Data Analyst to incorporate it into the pre-existing file. There are a couple of different types of models that can be used, however the most common one is the Star Schema Model.
The Star Schema is a desired model as it promotes usability, performance and scalability and allows for simpler DAX. Now you might be like me when I first heard the term ‘Star Schema’ and think to yourself “Sure those are some nice words Chris but why should I care? My data model works just fine the way it is right now” and sure you are ok for now, but what happens to your model once you add another 20k rows in a years' time? Will your model (or lack thereof) be able to handle it?
The simple answer is no, even if it does hold up it will cause significant frustration throughout the process.
As you can see in the above example when the data is not modelled the table is too wide, making it quite difficult to understand.
The schema is modelled off, you guessed it, a star, with tables categorized as Fact Tables in the middle of the model and Dimension Tables on the points on the outer side.
A Fact Table is a table that contains specific event data, this can be transactional data such as sales data or appointment data. Each row in this table refers to an individual event, there may be hundreds or thousands of these entries in any given fact table. The information contained in the Fact Table should be specifically tied to the event, an example of this would be a Property Sales Table that contains relevant data tied to the sale of properties. Only information directly relating to the sale of the property is included.
A Dimension Table is a table containing information that is related to a business event, it is usually static in nature and does not change. These tables sit around the Fact Table and are used as a reference table. An example of this would be a Suburb Table. This negates the need to include all the information relating to the suburb in the Fact Table, making it longer, harder to read and slower for Power Bi to process the data.
Dimension tables will include single unique references per row that refer to a column from within the Fact Table. It is also important to note that Fact and Dimension Tables almost always connect to the opposite table type, it would be extremely rare to see 2 Fact Tables or 2 Dimension Tables linked to each other.
The model is created this way to allow the Fact Tables to continue to grow without this growth inhibiting how we use the data. The Fact Table will reference the Dimension Table when it requires further information that is not held within its own table, this increases the performance and usability of the data model.
These two types of tables are linked together by what is known as a Surrogate Key.
A Surrogate Key is a unique identifier that is common between the two table types that is used to create the linkage between them. Fact Tables will contain multiple entries of the same Surrogate Key, whereas the Dimension Table would contain only one reference to the same key. The Fact Table would ‘look up’ the Dimension Table for the extra information weld within relating to that Key.
The tables can then have different relationship types, dependent on the type of table and the direction of the relationship. For the above example instead of having all the Suburb information in the Property Sales Data Table, we use a separate Dimension Table called Post Code to obtain all the relevant suburb information like Postcode, Country, State etc. These 2 tables are linked by the Suburb Surrogate Key with the relationship being Many to One.
There are 4 different relationships that the different tables can have with each other, these are:
You can also choose whether you want a relationship to transfer information in a single direction or both directions using the Cross Filter Direction setting. For example, if cross-filter directions are set to both, information can travel from table A to table B as well as from table B to table A. However, if cross filter direction is set to single, information can only travel one way (from table A to table B OR from table B to table A, but not both)
I hope this sheds some light on the distinct types of tables and their use within the Star Schema Model. It is something that seems foreign at first but once you put it into practice it will change the way you model your data forever.
Next time I will show you how to develop your new Star Schema skills even further with the power of the Time Dimension Table. Feel free to reach out on Discord for any of your Data related needs!
Leave a Reply