cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Machine Learning
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

How is Idempotency ensured for COPY INTO command

User16869510359
Esteemed Contributor
 
1 ACCEPTED SOLUTION

Accepted Solutions

User16869510359
Esteemed Contributor

COPY INTO command internally uses key-value store - RocksDB to store the details of the input files. This information is stored inside the Delta table log directory. This acts like the checkpointing information for a streaming query. Next time a COPY INTO command is triggered on the same table, as a first step, the data from the RocksDB is loaded and compared against the input files. Under the hood, a dedupe logic is performed to ensure idempotency. 

More details here: 

https://docs.databricks.com/spark/latest/spark-sql/language-manual/delta-copy-into.html

For COPY_OPTIONS, the parameter force if set to 'true', idempotency is disabled and files are loaded regardless of whether theyโ€™ve been loaded before. 

View solution in original post

2 REPLIES 2

User16869510359
Esteemed Contributor

COPY INTO command internally uses key-value store - RocksDB to store the details of the input files. This information is stored inside the Delta table log directory. This acts like the checkpointing information for a streaming query. Next time a COPY INTO command is triggered on the same table, as a first step, the data from the RocksDB is loaded and compared against the input files. Under the hood, a dedupe logic is performed to ensure idempotency. 

More details here: 

https://docs.databricks.com/spark/latest/spark-sql/language-manual/delta-copy-into.html

For COPY_OPTIONS, the parameter force if set to 'true', idempotency is disabled and files are loaded regardless of whether theyโ€™ve been loaded before. 

N_M
New Contributor III

How does COPY_INTO work with table restore?

I made some tests, and the restore method does NOT restore the key-store values of the target at the specific version, which means that the data that came after the chosen version cannot be inserted (unless forced).

Is this behavior intended?