Star Schema - Rajanand

The star schema is a widely used data modeling technique in data warehousing and business intelligence. It is designed to simplify complex data structures and optimize query performance for analytical workloads. The star schema gets its name from its visual representation, which resembles a star , with a central fact table connected to multiple dimension tables.

Components of a Star Schema

Fact Table:
- The central table in the star schema.
- Contains quantitative data (measures or metrics) such as sales, revenue, or quantity.
- Each row in the fact table represents a specific event or transaction.
- Connected to dimension tables via foreign keys.
Example:
- A fact table for a retail store might store sales transactions:
  - Fact_Sales: Transaction_ID, Date_ID, Product_ID, Customer_ID, Store_ID, Quantity_Sold, Total_Amount.
Dimension Tables:
- Surround the fact table like the points of a star.
- Contain descriptive attributes (context or metadata) related to the facts.
- Used to filter, group, or label the data in the fact table.
Examples:
- Dim_Date: Date_ID, Date, Month, Quarter, Year, Day_of_Week.
- Dim_Product: Product_ID, Product_Name, Category, Brand, Price.
- Dim_Customer: Customer_ID, Customer_Name, City, State, Phone_Number.
- Dim_Store: Store_ID, Store_Name, City, Manager_Name.

Example: Star Schema for a Retail Store

Star Schema

Fact Table

Fact_Sales

Transaction_ID	Date_ID	Product_ID	Customer_ID	Store_ID	Quantity_Sold	Total_Amount
101	20250101	1	1001	501	2	5000
102	20250102	2	1002	502	1	1500
103	20250103	3	1003	503	5	1000

Dimension Tables

Dim_Date:

Date_ID Date Month Quarter Year Day_of_Week
20250101 2025-01-01 January Q1 2025 Wednesday
20250102 2025-01-02 January Q1 2025 Thursday
20250103 2025-01-03 January Q1 2025 Friday
Dim_Product:

Product_ID Product_Name Category Brand Price
1 Saree Sarees Nalli 2500
2 Kurta-Pajama Ethnic Wear Fashion India 1500
3 Turmeric Powder Spices ABC 200
Dim_Customer:

Customer_ID Customer_Name City State Phone_Number
1001 Priya Sharma Mumbai Maharashtra 9876543210
1002 Rajesh Patel Ahmedabad Gujarat 8765432109
1003 Anjali Singh Delhi Delhi 7654321098
Dim_Store:

Store_ID Store_Name City Manager_Name
501 Nalli Silks Chennai Ramesh Kumar
502 Fashion India Bengaluru Sunita Reddy
503 ABC Spices Delhi Amit Sharma

Date_ID	Date	Month	Quarter	Year	Day_of_Week
20250101	2025-01-01	January	Q1	2025	Wednesday
20250102	2025-01-02	January	Q1	2025	Thursday
20250103	2025-01-03	January	Q1	2025	Friday

Product_ID	Product_Name	Category	Brand	Price
1	Saree	Sarees	Nalli	2500
2	Kurta-Pajama	Ethnic Wear	Fashion India	1500
3	Turmeric Powder	Spices	ABC	200

Customer_ID	Customer_Name	City	State	Phone_Number
1001	Priya Sharma	Mumbai	Maharashtra	9876543210
1002	Rajesh Patel	Ahmedabad	Gujarat	8765432109
1003	Anjali Singh	Delhi	Delhi	7654321098

Store_ID	Store_Name	City	Manager_Name
501	Nalli Silks	Chennai	Ramesh Kumar
502	Fashion India	Bengaluru	Sunita Reddy
503	ABC Spices	Delhi	Amit Sharma

How the Star Schema Works

Querying Data:

Suppose you want to find the total sales of sarees in Mumbai for January 2025.
The query would join the Fact_Sales table with the Dim_Date, Dim_Product, and Dim_Customer tables using their respective keys.

Example SQL Query:

SELECT SUM(Fact_Sales.Total_Amount) AS Total_Sales
FROM Fact_Sales
JOIN Dim_Date ON Fact_Sales.Date_ID = Dim_Date.Date_ID
JOIN Dim_Product ON Fact_Sales.Product_ID = Dim_Product.Product_ID
JOIN Dim_Customer ON Fact_Sales.Customer_ID = Dim_Customer.Customer_ID
WHERE Dim_Product.Category = 'Sarees'
  AND Dim_Customer.City = 'Mumbai'
  AND Dim_Date.Month = 'January'
  AND Dim_Date.Year = 2025;

Benefits:
- Simplified Queries: The star schema makes it easy to write and understand queries.
- Improved Performance: Optimized for read-heavy analytical workloads.
- Scalability: Can handle large datasets efficiently.

Advantages of Star Schema

Query Performance: Fewer joins are required compared to normalized schemas, leading to faster queries.
Simplicity: Easy to design and understand, even for non-technical users.
Flexibility: New dimensions can be added without disrupting existing queries.
Business-Friendly: Aligns well with how business users think about data (e.g., sales, customers, products).

Disadvantages of Star Schema

Data Redundancy: Dimension tables may contain redundant data (e.g., repeating city names in Dim_Customer).
Limited Flexibility for Complex Relationships: Not suitable for scenarios requiring many-to-many relationships between dimensions.
Storage Overhead: Large dimension tables can consume significant storage space.

Conclusion

The star schema is a powerful and intuitive data modeling technique that simplifies data analysis and improves query performance. By organizing data into a central fact table and surrounding dimension tables, it provides a clear and efficient structure for business intelligence and analytics.

Data Modeling

​Components of a Star Schema

​Example: Star Schema for a Retail Store

​Fact Table

​Dimension Tables

​How the Star Schema Works

​Advantages of Star Schema

​Disadvantages of Star Schema

​Conclusion