Snowflake Schema
A Normalized Data Modeling Approach
Snowflake Schema: A Normalized Data Modeling Approach
The snowflake schema is an extension of the star schema and is widely used in data warehousing and business intelligence. While the star schema is simple and denormalized, the snowflake schema introduces normalization by breaking down dimension tables into smaller, related tables. This results in a structure that resembles a snowflake , hence the name.
Components of a Snowflake Schema
-
Fact Table:
- The central table in the snowflake schema, just like in the star schema.
- Contains quantitative data (measures or metrics) such as sales, revenue, or quantity.
- Each row in the fact table represents a specific event or transaction.
- Connected to dimension tables via foreign keys.
Example:
- A fact table for a retail store might store sales transactions:
Fact_Sales
:Txn_ID
,Date_ID
,Product_ID
,Customer_ID
,Store_ID
,Quantity_Sold
,Total_Amount
.
-
Dimension Tables:
- Surround the fact table like the points of a snowflake.
- Contain descriptive attributes (context or metadata) related to the facts.
- Unlike the star schema, dimension tables in a snowflake schema are normalized, meaning they are split into multiple related tables to reduce redundancy.
Examples:
Dim_Date
:Date_ID
,Date
,Month
,Quarter
,Year
,Day_of_Week
.Dim_Product
:Product_ID
,Product_Name
,Category_ID
,Brand_ID
,Price
.Dim_Category
:Category_ID
,Category_Name
.Dim_Brand
:Brand_ID
,Brand_Name
.Dim_Customer
:Customer_ID
,Customer_Name
,City_ID
,Phone_Number
.Dim_City
:City_ID
,City
,State
.Dim_Store
:Store_ID
,Store_Name
,City_ID
,Manager_Name
.
Example: Snowflake Schema for a Retail Store
Snowflake Schema
Fact Table
Fact_Sales
Txn_ID | Date_ID | Product_ID | Customer_ID | Store_ID | Qty_Sold | Total_Amount |
---|---|---|---|---|---|---|
101 | 20250101 | 1 | 1001 | 501 | 2 | 5000 |
102 | 20250102 | 2 | 1002 | 502 | 1 | 1500 |
103 | 20250103 | 3 | 1003 | 503 | 5 | 1000 |
Dimension Tables
-
Dim_Date
:Date_ID Date Month Quarter Year Day_of_Week 20250101 2025-01-01 January Q1 2025 Wednesday 20250102 2025-01-02 January Q1 2025 Thursday 20250103 2025-01-03 January Q1 2025 Friday -
Dim_Product
:Product_ID Product_Name Category_ID Brand_ID Price 1 Saree 1 1 2500 2 Kurta-Pajama 2 2 1500 3 Turmeric Powder 3 3 200 -
Dim_Category
:Category_ID Category_Name 1 Sarees 2 Ethnic Wear 3 Spices -
Dim_Brand
:Brand_ID Brand_Name 1 Nalli 2 Fashion India 3 ABC -
Dim_Customer
:Customer_ID Customer_Name City_ID Phone_Number 1001 Priya Sharma 1 9876543210 1002 Rajesh Patel 2 8765432109 1003 Anjali Singh 3 7654321098 -
Dim_City
:City_ID City State 1 Mumbai Maharashtra 2 Ahmedabad Gujarat 3 Delhi Delhi -
Dim_Store
:Store_ID Store_Name City_ID Manager_Name 501 Nalli Silks 4 Ramesh Kumar 502 Fashion India 5 Sunita Reddy 503 ABC Spices 3 Amit Sharma
How the Snowflake Schema Works
-
Querying Data:
- Suppose you want to find the total sales of sarees in Mumbai for January 2025.
- The query would join the
Fact_Sales
table with multiple dimension tables using their respective keys. - Example SQL Query:
-
Benefits:
- Reduced Redundancy: Normalization minimizes data duplication.
- Improved Data Integrity: Ensures consistency across related tables.
- Flexibility: Easier to maintain and update.
Advantages of Snowflake Schema
- Normalization: Reduces data redundancy and improves storage efficiency.
- Data Integrity: Ensures consistency by maintaining relationships between tables.
- Scalability: Suitable for complex data models with many relationships.
Disadvantages of Snowflake Schema
- Query Performance: More joins can lead to slower query performance compared to the star schema.
- Complexity: More tables and joins make the schema harder to design and understand.
- Business-Friendliness: Less intuitive for business users compared to the star schema.
Conclusion
The snowflake schema is a powerful data modeling technique that introduces normalization to reduce redundancy and improve data integrity. While it is complex than the star schema, it is well-suited for scenarios requiring detailed and structured data relationships.