Snowflake Schema - Rajanand

Snowflake Schema: A Normalized Data Modeling Approach

The snowflake schema is an extension of the star schema and is widely used in data warehousing and business intelligence. While the star schema is simple and denormalized, the snowflake schema introduces normalization by breaking down dimension tables into smaller, related tables. This results in a structure that resembles a snowflake , hence the name.

Components of a Snowflake Schema

Fact Table:
- The central table in the snowflake schema, just like in the star schema.
- Contains quantitative data (measures or metrics) such as sales, revenue, or quantity.
- Each row in the fact table represents a specific event or transaction.
- Connected to dimension tables via foreign keys.
Example:
- A fact table for a retail store might store sales transactions:
  - Fact_Sales: Txn_ID, Date_ID, Product_ID, Customer_ID, Store_ID, Quantity_Sold, Total_Amount.
Dimension Tables:
- Surround the fact table like the points of a snowflake.
- Contain descriptive attributes (context or metadata) related to the facts.
- Unlike the star schema, dimension tables in a snowflake schema are normalized, meaning they are split into multiple related tables to reduce redundancy.
Examples:
- Dim_Date: Date_ID, Date, Month, Quarter, Year, Day_of_Week.
- Dim_Product: Product_ID, Product_Name, Category_ID, Brand_ID, Price.
- Dim_Category: Category_ID, Category_Name.
- Dim_Brand: Brand_ID, Brand_Name.
- Dim_Customer: Customer_ID, Customer_Name, City_ID, Phone_Number.
- Dim_City: City_ID, City, State.
- Dim_Store: Store_ID, Store_Name, City_ID, Manager_Name.

Example: Snowflake Schema for a Retail Store

Star Schema

Snowflake Schema

Fact Table

Fact_Sales

Txn_ID	Date_ID	Product_ID	Customer_ID	Store_ID	Qty_Sold	Total_Amount
101	20250101	1	1001	501	2	5000
102	20250102	2	1002	502	1	1500
103	20250103	3	1003	503	5	1000

Dimension Tables

Dim_Date:

Date_ID Date Month Quarter Year Day_of_Week
20250101 2025-01-01 January Q1 2025 Wednesday
20250102 2025-01-02 January Q1 2025 Thursday
20250103 2025-01-03 January Q1 2025 Friday
Dim_Product:

Product_ID Product_Name Category_ID Brand_ID Price
1 Saree 1 1 2500
2 Kurta-Pajama 2 2 1500
3 Turmeric Powder 3 3 200
Dim_Category:

Category_ID Category_Name
1 Sarees
2 Ethnic Wear
3 Spices
Dim_Brand:

Brand_ID Brand_Name
1 Nalli
2 Fashion India
3 ABC
Dim_Customer:

Customer_ID Customer_Name City_ID Phone_Number
1001 Priya Sharma 1 9876543210
1002 Rajesh Patel 2 8765432109
1003 Anjali Singh 3 7654321098
Dim_City:

City_ID City State
1 Mumbai Maharashtra
2 Ahmedabad Gujarat
3 Delhi Delhi
Dim_Store:

Store_ID Store_Name City_ID Manager_Name
501 Nalli Silks 4 Ramesh Kumar
502 Fashion India 5 Sunita Reddy
503 ABC Spices 3 Amit Sharma

Date_ID	Date	Month	Quarter	Year	Day_of_Week
20250101	2025-01-01	January	Q1	2025	Wednesday
20250102	2025-01-02	January	Q1	2025	Thursday
20250103	2025-01-03	January	Q1	2025	Friday

Product_ID	Product_Name	Category_ID	Brand_ID	Price
1	Saree	1	1	2500
2	Kurta-Pajama	2	2	1500
3	Turmeric Powder	3	3	200

Category_ID	Category_Name
1	Sarees
2	Ethnic Wear
3	Spices

Brand_ID	Brand_Name
1	Nalli
2	Fashion India
3	ABC

Customer_ID	Customer_Name	City_ID	Phone_Number
1001	Priya Sharma	1	9876543210
1002	Rajesh Patel	2	8765432109
1003	Anjali Singh	3	7654321098

City_ID	City	State
1	Mumbai	Maharashtra
2	Ahmedabad	Gujarat
3	Delhi	Delhi

Store_ID	Store_Name	City_ID	Manager_Name
501	Nalli Silks	4	Ramesh Kumar
502	Fashion India	5	Sunita Reddy
503	ABC Spices	3	Amit Sharma

How the Snowflake Schema Works

Querying Data:

Suppose you want to find the total sales of sarees in Mumbai for January 2025.
The query would join the Fact_Sales table with multiple dimension tables using their respective keys.

Example SQL Query:

SELECT SUM(Fact_Sales.Total_Amount) AS Total_Sales
FROM Fact_Sales
JOIN Dim_Date ON Fact_Sales.Date_ID = Dim_Date.Date_ID
JOIN Dim_Product ON Fact_Sales.Product_ID = Dim_Product.Product_ID
JOIN Dim_Category ON Dim_Product.Category_ID = Dim_Category.Category_ID
JOIN Dim_Customer ON Fact_Sales.Customer_ID = Dim_Customer.Customer_ID
JOIN Dim_City ON Dim_Customer.City_ID = Dim_City.City_ID
WHERE Dim_Category.Category_Name = 'Sarees'
  AND Dim_City.City = 'Mumbai'
  AND Dim_Date.Month = 'January'
  AND Dim_Date.Year = 2025;

Benefits:
- Reduced Redundancy: Normalization minimizes data duplication.
- Improved Data Integrity: Ensures consistency across related tables.
- Flexibility: Easier to maintain and update.

Advantages of Snowflake Schema

Normalization: Reduces data redundancy and improves storage efficiency.
Data Integrity: Ensures consistency by maintaining relationships between tables.
Scalability: Suitable for complex data models with many relationships.

Disadvantages of Snowflake Schema

Query Performance: More joins can lead to slower query performance compared to the star schema.
Complexity: More tables and joins make the schema harder to design and understand.
Business-Friendliness: Less intuitive for business users compared to the star schema.

Conclusion

The snowflake schema is a powerful data modeling technique that introduces normalization to reduce redundancy and improve data integrity. While it is complex than the star schema, it is well-suited for scenarios requiring detailed and structured data relationships.

Data Modeling

​Snowflake Schema: A Normalized Data Modeling Approach

​Components of a Snowflake Schema

​Example: Snowflake Schema for a Retail Store

​Fact Table

​Dimension Tables

​How the Snowflake Schema Works

​Advantages of Snowflake Schema

​Disadvantages of Snowflake Schema

​Conclusion