_delta_log) and Parquet files are updated.
Scenario
- You have a Delta Table with the following data stored in a single Parquet file (
1.parquet):
| id | name |
|---|---|
| 1 | Arun |
| 2 | Bala |
- You want to add a new row to the table:
(3, "Raj").
Step-by-Step Process
1. Initial State
- The table is created with
1.parquet, and the transaction log (0.json) records this initial state:
0.json
- At this point, the table contains only
1.parquet, and the transaction log (0.json) reflects that.
2. Adding a New Row
When you add the new row(3, "Raj"), the following happens:
Insert operation on a delta table
-
A new Parquet file (
2.parquet) is created: This file contains only the new row:(3, "Raj"). -
The transaction log is updated: A new transaction log entry (
1.json) is created to record the addition of2.parquet.
1.json
3. Final State
After the new row is added: The table now has two Parquet files:-
1.parquet: Contains the original rows(1, "Arun")and(2, "Bala"). -
2.parquet: Contains the new row(3, "Raj"). -
The transaction log (
_delta_log) now has two entries:0.json: Records the creation of1.parquet.1.json: Records the addition of2.parquet.
How Queries Work
When you query the table after adding the new row:- The transaction log is consulted to determine which Parquet files are part of the current version of the table.
- The query reads data from both
1.parquetand2.parquet. - The result is a combined view of the data:
| id | name |
|---|---|
| 1 | Arun |
| 2 | Bala |
| 3 | Raj |
Why Not Append to the Existing File?
You might wonder why Delta Tables don’t simply append the new row to the existing1.parquet. There are a few reasons for this:
- Immutability of Parquet Files: Parquet files are immutable, meaning they cannot be modified once written. Appending to an existing file would break this immutability.
- Efficiency: Writing a new file is often more efficient than rewriting an existing file, especially for large datasets.
- Concurrency: Delta Tables are designed to handle concurrent reads and writes. Writing a new file ensures that readers can continue to access the existing data without interruption.
- Time Travel: By creating a new file and updating the transaction log, Delta Tables maintain a history of changes, enabling features like time travel (querying previous versions of the table).
Summary
When a new row is added to a Delta Table:- A new Parquet file is created to store the new row.
- The transaction log is updated to record the addition of the new file.
- Queries combine data from all relevant Parquet files to provide a consistent view of the table.