Hello everyone, I hope everyone is having a great day. Thank you so much for taking the time to read my blog.
To follow along with what I am doing in this blog, click here.
In data management, analytics, and database design world data modeling is crucial. Data modeling plays a vital role in ensuring that data is well-structured, accurate, and aligned with the needs of the organization. Now, when we are looking at a data model there are two ways of approaching the set-up process, the Star Schema, and the Snowflake Schema.
The first data model set-up I will address is the Star Schema. Generally, Star Schemas are simpler to work with and it is easier to organize. It works by having a central table called a “Fact Table” while there are other tables connected to it known as “Dimension Tables.” The “Fact Table” contains information about specific events or facts while the “Dimension Tables” contain descriptive attributes about these specific events or facts. The purpose of the Star Schema is to make it easy to understand but because of how it is laid out it causes increased chances for redundant data.
The alternative method for setting up your data is the Snowflake Schema, which is designed to minimize data redundancy. It works the same way as a Star Schema in that a “Fact Table” is what connects all of the tables, but the Snowflake Schema has tables that branch off other tables (I like to look at it as a “Tree Schema” since the tables branch off each other). The Snowflake Schema is more complex due to the intricate manner in which tables interconnect with one another and cascade further.
For the purpose of this blog, we are going to use the Snowflake Schema to set up our data. Power BI doesn’t automatically put the tables into one of these schemas, so we sort of have to do this ourselves and where we start doesn’t look very pretty (See Figure 1 below).
Figure 1 above illustrates the outcome after I managed to reorganize all the tables and bring them together a bit more. This is a little daunting but if we go through cleaning up the data model, we will get it optimized for our purposes.
To start cleaning up the data model, I am going to bring all of my “Fact Tables” to the bottom of the data modeling view and spread out all of the “Dimension Tables.” In Figure 2, you can see the result of my efforts.
Now this doesn’t really look too much better, but it is a starting point. In Figure 3 below, I optimize the data model by reducing the clutter of numerous connecting lines, resulting in a more condensed arrangement of tables. Notice, some of these tables did not connect with the other ones (highlighted in red in Figure 3 below). This is because there would be issues with the cardinality or uniqueness of the data in these tables. Because of this, we are going to delete these tables from the data model.
To delete tables from the data model, we will need to click on the ellipses next to the table names (highlighted in red in Figure 4). After doing this, we will click on the “Delete from model” button in the drop-down menu (highlighted in blue in Figure 4).
Now that we are done cleaning up the data model, we can move on to the report page and start creating some of our visualizations. Unfortunately, that will not be covered in today’s blog but will be coming soon. In the meantime, have a great day, God bless
For the final data model result using Snowflake Schema set-up, click here.
Bailey McDonald
Data Engineer, Patriot Consulting
Email: bkmcdonald@patriotconsultingcorp.com | Blogs: Patriot Consulting Blogs
LinkedIn: Personal: BaileyMcDonald | Company: Patriot Consulting