Databricks Community

brickster_2018 · ‎06-25-2021

Is it transaction safe?Does it ensure atomicity

brickster_2018 · ‎06-25-2021

Atomicity is ensured at a task level and not at a stage level. For any reason, if the stage is getting retried, the tasks which already completed the write operation will re-run and cause duplicate records. This is expected by design.

When Apache Spark performs a JDBC write, one partition of the DataFrame is written to a SQL table. This is generally done as a single JDBC transaction, in order to avoid repeatedly inserting data. However, if the transaction fails after the commit occurs, but before the final stage completes, it is possible for duplicate data to be copied into the SQL table.

Verify that speculative execution is disabled in your Spark configuration: spark.speculation false. This is disabled by default. These configurations increase the possibility of retries.

Creating a temporary table to buffer the data and then MERGING it to the actual table could be a potential workaround.

View solution in original post

brickster_2018 · ‎06-25-2021

Atomicity is ensured at a task level and not at a stage level. For any reason, if the stage is getting retried, the tasks which already completed the write operation will re-run and cause duplicate records. This is expected by design.

When Apache Spark performs a JDBC write, one partition of the DataFrame is written to a SQL table. This is generally done as a single JDBC transaction, in order to avoid repeatedly inserting data. However, if the transaction fails after the commit occurs, but before the final stage completes, it is possible for duplicate data to be copied into the SQL table.

Verify that speculative execution is disabled in your Spark configuration: spark.speculation false. This is disabled by default. These configurations increase the possibility of retries.

Creating a temporary table to buffer the data and then MERGING it to the actual table could be a potential workaround.

Databricks Community

Can Spark JDBC create duplicate records

Join Us as a Local Community Builder!

PSA: Community Edition retires on January 1, 2026. Move to the Free Edition today to keep your work.

🎤 Call for Presentations: Data + AI Summit 2026 is Open!

Last Chance: Help Shape the 2026 Data + AI Summit | Win a Full Conference Pass

🌟 Community Pulse: Your Weekly Roundup! December 05 – 11, 2025

Jaipur Usergroup First Virtual Meetup: AI/BI Genie + Data Science Careers — 19 Dec | 6 PM IST