Deletion vectors (DVs) in Delta Lake optimize delete operations by marking deleted rows logically in metadata rather than rewriting Parquet files. When a DELETE statement is executed, affected rows are tracked by DVs in the transaction log. The data remains in the underlying files but is filtered out during query reads. This improves performance for frequent deletes and updates since file rewrites are deferred. Physical data removal only occurs when a VACUUM command is later executed. The Databricks documentation confirms: “With deletion vectors, deleted rows are marked in metadata and skipped at read time, avoiding file rewrites.” Thus, rows are marked as deleted in metadata—not in files.
Contribute your Thoughts:
Chosen Answer:
This is a voting comment (?). You can switch to a simple comment. It is better to Upvote an existing comment if you don't have anything to add.
Submit