- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
06-25-2021 07:01 AM
Accepted Solutions
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
06-25-2021 07:02 AM
COPY INTO command internally uses key-value store - RocksDB to store the details of the input files. This information is stored inside the Delta table log directory. This acts like the checkpointing information for a streaming query. Next time a COPY INTO command is triggered on the same table, as a first step, the data from the RocksDB is loaded and compared against the input files. Under the hood, a dedupe logic is performed to ensure idempotency.
More details here:
https://docs.databricks.com/spark/latest/spark-sql/language-manual/delta-copy-into.html
For COPY_OPTIONS, the parameter force if set to 'true', idempotency is disabled and files are loaded regardless of whether they’ve been loaded before.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
06-25-2021 07:02 AM
COPY INTO command internally uses key-value store - RocksDB to store the details of the input files. This information is stored inside the Delta table log directory. This acts like the checkpointing information for a streaming query. Next time a COPY INTO command is triggered on the same table, as a first step, the data from the RocksDB is loaded and compared against the input files. Under the hood, a dedupe logic is performed to ensure idempotency.
More details here:
https://docs.databricks.com/spark/latest/spark-sql/language-manual/delta-copy-into.html
For COPY_OPTIONS, the parameter force if set to 'true', idempotency is disabled and files are loaded regardless of whether they’ve been loaded before.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
12-01-2023 07:27 AM
How does COPY_INTO work with table restore?
I made some tests, and the restore method does NOT restore the key-store values of the target at the specific version, which means that the data that came after the chosen version cannot be inserted (unless forced).
Is this behavior intended?

