- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
10-28-2025 08:55 AM
@jorperort When writing to SQL Server tables with composite primary keys from Databricks using JDBC, unique constraint violations are often caused by Spark’s distributed retry logic https://docs.databricks.com/aws/en/archive/connectors/jdbc
Solutions
Write to Staging Table and Use MERGE:
The recommended way is to always route batch writes to a temporary or staging table in SQL Server, then execute a database-level MERGE (upsert)
Tune Write Parallelism:
Adjust numPartitions, batchsize, and manage transaction isolation through JDBC to minimize retry issues. See official options and guidance on parallelism
https://docs.databricks.com/aws/en/archive/connectors/jdbc#control-parallelism-for-jdbc-queries
Validate DataFrame for Duplicates:
Always invoke .dropDuplicates([PK columns]) on the DataFrame before write.
https://docs.databricks.com/aws/en/archive/connectors/jdbc
SQL Server’s “IGNORE_DUP_KEY” option can sometimes help, but since yours is OFF, conflicts are not ignored.Databricks guidance on JDBC driver
https://docs.databricks.com/aws/en/ingestion/lakeflow-connect/sql-server-source-setup