Rajanand home page
Rajanand
Contact
Newsletter
Newsletter
Search...
Navigation
Requirements Gathering
Have a great day! π€©
βK
Requirements Gathering
β
Introduction
Goal
: Understand how to gather requirements and design data systems that meet stakeholder needs and business goals.
Key Concepts
:
Hierarchy of Needs
: Business goals β Stakeholder needs β System requirements (functional and non-functional).
Requirements Gathering
: Conversations with stakeholders to understand their needs and current systems.
Documentation
: Clearly document functional and non-functional requirements.
β
Hierarchy of Needs
Business Goals
π©:
High-level objectives (e.g., increase revenue, improve customer retention, expand to new markets).
Example: βThe company aims to grow by launching new products and improving customer retention.β
Stakeholder Needs
:
What stakeholders (e.g., marketing, data scientists) need to achieve business goals.
Example: βMarketing needs real-time dashboards to monitor product sales and a recommender system for personalized product recommendations.β
System Requirements
:
Functional Requirements
: What the system must do (e.g., serve data no more than one hour old).
Non-Functional Requirements
: Characteristics of the system (e.g., scalability, reliability, latency).
β
Requirements Gathering Process
Identify Stakeholders
:
Talk to leadership (e.g., CTO, CEO) to understand business goals.
Engage with end users (e.g., marketing, data scientists) to understand their needs.
Understand Current Systems
:
Learn about existing systems and their limitations.
Example: Marketing currently gets daily data files, but they need real-time data.
Ask Key Questions
:
What actions will stakeholders take with the data?
What problems exist with the current system?
Who else should you talk to for more information?
Document Requirements
:
Use a hierarchical format to connect business goals, stakeholder needs, and system requirements.
β
Functional Requirements
Analytics Dashboards
:
Serve data no more than one hour old.
Example: βThe system must provide real-time product sales data for marketing dashboards.β
Recommender System
:
Provide training data for the recommender model.
Ingest, transform, and serve user data to the model.
Return product recommendations to the sales platform.
Example: βThe system must serve personalized product recommendations based on user behavior.β
β
Non-Functional Requirements
Analytics Dashboards
:
Scalability
: Handle peak user activity without slowing down.
Reliability
: Perform data quality checks to ensure data conforms to the expected format.
Maintainability
: Easily adapt to changes in data schema.
Recommender System
:
Latency
: Serve recommendations in less than one second.
Scalability
: Handle maximum concurrent users.
Reliability
: Default to popular products if the recommender fails.
β
Conversations with Stakeholders
Marketing Team
:
Needs real-time dashboards to monitor product sales and react to demand spikes.
Wants a personalized recommender system for customers.
Data Scientists
:
Currently work with daily data files but need real-time data for dashboards and recommender models.
Software Engineers
:
Plan to set up a read replica database and API for continuous data access.
Will notify data engineers of schema changes and system outages.
β
Trade-Offs in Requirements Gathering
Iron Triangle
πΌ:
Scope
: Features and functionality of the system.
Timeline
: How quickly the system needs to be built.
Cost
: Budget constraints for the project.
Key Insight
: You canβt optimize all three simultaneously (e.g., fast and cheap may compromise quality).
Solution
:
Build
loosely coupled systems
for flexibility.
Make
reversible decisions
(two-way doors).
Deeply understand
stakeholder needs
to prioritize effectively.
β
Sample Project: Recommender System
β
Key Components of the Project
Recommender System
: A content-based recommender system is being developed to recommend products to users based on:
User features
: Customer number, credit limit, city, postal code, country.
Product features
: Product code, quantity in stock, buy price, MSRP, product line, product scale.
User interactions
: Products browsed or added to the cart.
Two Data Pipelines
:
Batch Data Pipeline
: Delivers training data to the data scientist for model retraining.
Streaming Data Pipeline
: Provides real-time product recommendations to users based on their activity.
β
Functional Requirements
Batch Pipeline
:
Deliver training data in tabular format.
Include user features, product features, and user ratings (1-5).
Support retraining the model periodically (weekly, monthly, or quarterly).
Handle modifications in data format (e.g., new user or product features).
Streaming Pipeline
:
Provide real-time recommendations with subsecond latency (1-2 seconds).
Handle up to 10,000 concurrent users, with potential for growth.
Use pre-trained recommender system to generate recommendations.
Save model outputs for later analysis.
β
Non-Functional Requirements
Latency
: Recommendations must be generated in under 1 second to match page rendering times.
Scalability
: The system must handle spikes of up to 10,000 concurrent users and scale as the company grows.
Flexibility
: The system should accommodate changes in data format (e.g., new features).
Operational Overhead
: Minimize the effort required to deliver new batches of training data.
β
Recommender System Details
Content-Based Recommender
:
Uses vector embeddings for users and products to find similarities.
Predicts user ratings for products based on embeddings.
Combines recommendations from:
User features (e.g., βBased on your profile, you may likeβ¦β).
Product interactions (e.g., βBased on your browsing history, you may likeβ¦β).
Vector Database
:
Stores precomputed product embeddings for faster similarity searches.
Organizes embeddings so similar products are close together, speeding up retrieval.
β
Implementation Steps
Extract Requirements
: Identify functional and non-functional requirements from the conversation.
Select Tools
: Choose AWS tools and services that meet the requirements.
Lab Exercise
:
Set up batch pipeline to deliver training data.
Use pre-trained recommender system for streaming pipeline.
Implement vector database for fast similarity searches.
β
Key Takeaways
Stakeholder Engagement
:
Talk to leadership, end users, and source system owners to understand needs and constraints.
Documentation
:
Clearly document functional and non-functional requirements using a hierarchical format.
Trade-Offs
:
Balance scope, timeline, and cost by applying principles like loose coupling and reversible decisions.
System Design
:
Design systems that are scalable, reliable, and maintainable to meet stakeholder needs and business goals.
Source
: DeepLearning.ai data engineering course.
Assistant
Responses are generated using AI and may contain mistakes.
On this page
Introduction
Hierarchy of Needs
Requirements Gathering Process
Functional Requirements
Non-Functional Requirements
Conversations with Stakeholders
Trade-Offs in Requirements Gathering
Sample Project: Recommender System
Key Components of the Project
Functional Requirements
Non-Functional Requirements
Recommender System Details
Implementation Steps
Key Takeaways