.crc
files are used to verify the integrity of the JSON files. .crc
file contains the metadata of the write.df.write.format("delta").partitionBy("region").save("/mnt/tables/sales")
DESCRIBE
command returns the metadata of an existing table.
DESCRIBE EXTENDED
command returns the metadata of an existing table.
DESCRIBE FORMATTED
command returns the metadata of an existing table.
DESCRIBE DETAIL
command returns the metadata of an existing table.
DESCRIBE HISTORY
command returns the history of the table.
DESCRIBE HISTORY
command to get the history of the table. The history of the table, the version of the table, the timestamp of the transaction, the operation performed (add, remove, update, etc.), the files added or removed from the table.VACUUM
on Delta table
VACUUM
removes all files from the table directory that are no longer in the latest state of the transaction log for the table and are older than a retention threshold (168 hours = 7 days)VACUUM my_table RETAIN 0 HOURS
- This will delete all files from the table directory that are no longer in the latest state of the transaction log for the table and are older than a retention threshold (0 hours = 0 days).DRY RUN
flag to end of of VACUUM command to see what files would be deleted without actually deleting them.VACUUM my_table RETAIN 0 HOURS DRY RUN
- This will show you what files would be deleted without actually deleting them.VACUUM
on a Delta table, you lose ability to time travel back to a version older than the specified data retention period. Each time a checkpoint is written (by default, every 10 commits, a parquet file is saved in _delta_log), Databricks automatically cleans up log entries older than retention interval.Q: Why was delta lake named as delta lake?
delta
) within a data lake environment. The name is concise, memorable, and accurately reflects its purpose and capabilities.