Batch Processing is a method of processing large volumes of data in groups (batches) at scheduled intervals, rather than processing data in real-time. It is commonly used for tasks like data ingestion, transformation, and reporting, where immediate processing is not required.
Batch Processing involves:
Batch:
Scheduler:
ETL (Extract, Transform, Load):
Latency:
Throughput:
Data Collection:
Scheduling:
Apache Spark:
ETL Tools:
Cron:
Workflow Orchestration Tools:
E-Commerce:
Finance:
Healthcare:
Processing patient data from multiple sources for analysis and reporting.
Example: Aggregating patient records using Hadoop and generating daily reports.