Snowflake Schema: A Normalized Data Modeling Approach

The snowflake schema is an extension of the star schema and is widely used in data warehousing and business intelligence. While the star schema is simple and denormalized, the snowflake schema introduces normalization by breaking down dimension tables into smaller, related tables. This results in a structure that resembles a snowflake , hence the name.

Components of a Snowflake Schema

  1. Fact Table:

    • The central table in the snowflake schema, just like in the star schema.
    • Contains quantitative data (measures or metrics) such as sales, revenue, or quantity.
    • Each row in the fact table represents a specific event or transaction.
    • Connected to dimension tables via foreign keys.

    Example:

    • A fact table for a retail store might store sales transactions:
      • Fact_Sales: Txn_ID, Date_ID, Product_ID, Customer_ID, Store_ID, Quantity_Sold, Total_Amount.
  2. Dimension Tables:

    • Surround the fact table like the points of a snowflake.
    • Contain descriptive attributes (context or metadata) related to the facts.
    • Unlike the star schema, dimension tables in a snowflake schema are normalized, meaning they are split into multiple related tables to reduce redundancy.

    Examples:

    • Dim_Date: Date_ID, Date, Month, Quarter, Year, Day_of_Week.
    • Dim_Product: Product_ID, Product_Name, Category_ID, Brand_ID, Price.
    • Dim_Category: Category_ID, Category_Name.
    • Dim_Brand: Brand_ID, Brand_Name.
    • Dim_Customer: Customer_ID, Customer_Name, City_ID, Phone_Number.
    • Dim_City: City_ID, City, State.
    • Dim_Store: Store_ID, Store_Name, City_ID, Manager_Name.

Example: Snowflake Schema for a Retail Store

Snowflake Schema

Fact Table

Fact_Sales

Txn_IDDate_IDProduct_IDCustomer_IDStore_IDQty_SoldTotal_Amount
101202501011100150125000
102202501022100250211500
103202501033100350351000

Dimension Tables

  1. Dim_Date:

    Date_IDDateMonthQuarterYearDay_of_Week
    202501012025-01-01JanuaryQ12025Wednesday
    202501022025-01-02JanuaryQ12025Thursday
    202501032025-01-03JanuaryQ12025Friday
  2. Dim_Product:

    Product_IDProduct_NameCategory_IDBrand_IDPrice
    1Saree112500
    2Kurta-Pajama221500
    3Turmeric Powder33200
  3. Dim_Category:

    Category_IDCategory_Name
    1Sarees
    2Ethnic Wear
    3Spices
  4. Dim_Brand:

    Brand_IDBrand_Name
    1Nalli
    2Fashion India
    3ABC
  5. Dim_Customer:

    Customer_IDCustomer_NameCity_IDPhone_Number
    1001Priya Sharma19876543210
    1002Rajesh Patel28765432109
    1003Anjali Singh37654321098
  6. Dim_City:

    City_IDCityState
    1MumbaiMaharashtra
    2AhmedabadGujarat
    3DelhiDelhi
  7. Dim_Store:

    Store_IDStore_NameCity_IDManager_Name
    501Nalli Silks4Ramesh Kumar
    502Fashion India5Sunita Reddy
    503ABC Spices3Amit Sharma

How the Snowflake Schema Works

  1. Querying Data:

    • Suppose you want to find the total sales of sarees in Mumbai for January 2025.
    • The query would join the Fact_Sales table with multiple dimension tables using their respective keys.
    • Example SQL Query:
      SELECT SUM(Fact_Sales.Total_Amount) AS Total_Sales
      FROM Fact_Sales
      JOIN Dim_Date ON Fact_Sales.Date_ID = Dim_Date.Date_ID
      JOIN Dim_Product ON Fact_Sales.Product_ID = Dim_Product.Product_ID
      JOIN Dim_Category ON Dim_Product.Category_ID = Dim_Category.Category_ID
      JOIN Dim_Customer ON Fact_Sales.Customer_ID = Dim_Customer.Customer_ID
      JOIN Dim_City ON Dim_Customer.City_ID = Dim_City.City_ID
      WHERE Dim_Category.Category_Name = 'Sarees'
        AND Dim_City.City = 'Mumbai'
        AND Dim_Date.Month = 'January'
        AND Dim_Date.Year = 2025;
      
  2. Benefits:

    • Reduced Redundancy: Normalization minimizes data duplication.
    • Improved Data Integrity: Ensures consistency across related tables.
    • Flexibility: Easier to maintain and update.

Advantages of Snowflake Schema

  1. Normalization: Reduces data redundancy and improves storage efficiency.
  2. Data Integrity: Ensures consistency by maintaining relationships between tables.
  3. Scalability: Suitable for complex data models with many relationships.

Disadvantages of Snowflake Schema

  1. Query Performance: More joins can lead to slower query performance compared to the star schema.
  2. Complexity: More tables and joins make the schema harder to design and understand.
  3. Business-Friendliness: Less intuitive for business users compared to the star schema.

Conclusion

The snowflake schema is a powerful data modeling technique that introduces normalization to reduce redundancy and improve data integrity. While it is complex than the star schema, it is well-suited for scenarios requiring detailed and structured data relationships.