The Delta Lake protocol offers ACID properties for managing large datasets stored as files in distributed file systems or object stores. Its key design goals include:
Delta Lake uses a multi-version concurrency control (MVCC) mechanism. The core components are:
Writers create new data files and then commit changes by appending a log entry to the transaction log. This log entry details files to add or remove, and any metadata modifications. Delta Lake employs OCC to handle concurrent writes efficiently. Instead of locking data, writers optimistically assume that no conflicts will occur. A writer performs the following steps:
Write Phase: The writer creates new data files containing the updated data. These files are written to the storage location but are not yet considered part of the table’s official state.
Commit Phase: Once the writer has finished writing all the new data files, it commits the changes by appending a transaction log entry. This entry describes the changes made (files added, files removed, metadata updates) and includes a unique transaction ID.
Conflict Detection: The transaction log is used to detect conflicts. If another writer has modified the same data during the first writer’s write phase, a conflict is detected during the commit phase. In case of a conflict, the committing writer’s transaction is aborted, and it needs to retry the operation. The specific conflict resolution strategy depends on the implementation.
OCC avoids the performance overhead of traditional locking mechanisms, making it suitable for high-concurrency environments. However, it does introduce the possibility of conflicts and the need for retry mechanisms.
The transaction log is the central component of Delta Lake’s architecture. It’s an append-only log that records every action performed on the table. Each entry represents a transaction and includes details like added/removed files, metadata updates, and transaction identifiers. Checkpoints within this log provide efficient read access by allowing readers to avoid scanning the entire log. Different checkpoint types exist (UUID-named, classic, and deprecated multi-part).
Each entry in the log represents a committed transaction and includes:
The transaction log is crucial for:
Readers in Delta Lake always see a consistent snapshot of the data, regardless of concurrent writes. This is achieved by:
Snapshot isolation ensures that readers never see partially written or inconsistent data. It guarantees that each read operation sees a consistent view of the table, even if writes are occurring concurrently.
Instead of immediately removing deleted files, Delta Lake employs lazy deletion. When a file is deleted, it’s not physically removed from the storage. Instead, they are marked as “tombstones” in the transaction log. The actual removal of the physical files is performed later by a separate process (typically the vacuum
command).
Lazy deletion offers several advantages:
vacuum
command removes the physical files, reclaiming storage space.Checkpoints are periodic snapshots of the transaction log. They contain a summarized representation of the log entries up to a specific point in time. Checkpoints significantly improve read performance because readers can use them to avoid scanning the entire transaction log. They provide a more efficient way to reconstruct the table’s state at a given version. Different checkpoint types exist (UUID-named, classic, and deprecated multi-part), each with its own characteristics.
Delta Lake uses several file types to manage its data and metadata efficiently. These files are organized within the table’s directory structure.
part1=value1/part2=value2/...
). The partitioning is for organizational purposes and isn’t strictly required by the protocol itself; the actual partition values are tracked in the transaction log._change_data
directory, these files record changes made to the table in a specific version. This is useful for change data capture (CDC) applications, allowing efficient tracking of data modifications over time._delta_log
directory, these JSON files record every action performed on the table. This log is crucial for maintaining the table’s history and enabling consistent reads and writes._delta_log
, checkpoints are snapshots of the table’s state at a specific version. They provide a more efficient way to read the table’s data than processing the entire transaction log, significantly improving read performance. Different checkpoint types exist (UUID-named, classic, and deprecated multi-part)._delta_log
, aggregate actions from a range of commits. They help reduce the size and improve the efficiency of the transaction log over time._delta_log/_last_checkpoint
, this file points to the most recent checkpoint, providing quick access to the latest table state._delta_log
, contains checksums to verify the integrity of the table’s data and metadata.The state of a Delta table at any given version (snapshot) is defined by the following:
Actions are the fundamental operations that modify the table’s state. They are recorded in the Delta log and checkpoints. Key actions include:
metaData
: Modifies the table’s metadata (schema, properties, etc.).add
and remove
: Add or remove data files or other files from the table.addCDCFile
: Adds a change data file.txn
: Records application-specific transaction identifiers, linking actions to specific transactions.protocol
: Updates the protocol version used by the table.commitInfo
: Stores information about the commit operation, such as timestamps and committer information.domainMetadata
: Allows for storing custom metadata within named domains.sidecar
(V2 checkpoints): References sidecar files containing file actions (used in V2 checkpoints).Delta Lake’s table features extend its core functionality, providing advanced capabilities and compatibility with various data processing tools and workflows. These features are enabled on a per-table basis and are specified within the table’s metadata. They are controlled through the readerFeatures
and writerFeatures
within the protocol
action in the transaction log. Readers use this information to correctly interpret and process the data. The specific requirements for writers and readers vary depending on the feature. For example, a reader might need to support deletion vectors to correctly handle deleted rows, while a writer might need to generate appropriate metadata for clustered tables.
The Delta Lake specification details the requirements for each feature, ensuring interoperability between different implementations and tools. The features are designed to be modular and extensible, allowing for future additions and improvements without breaking compatibility with existing systems. They provide a powerful mechanism for customizing Delta Lake tables to meet the specific needs of different applications and workflows.
Let’s explore some key table features:
VACUUM
command (used for removing tombstones) adheres to the Delta Lake protocol. It helps maintain data consistency and prevents accidental data loss during the cleanup process. It adds a layer of safety to the VACUUM
operation.Delta Lake uses a sophisticated combination of a transaction log, MVCC, and checkpoints to provide a robust and scalable solution for managing large datasets with ACID properties.