01-24-2022 11:30 AM
I'm using databricks to connect to a SQL managed instance via JDBC. SQL operations I need to perform include DELETE, UPDATE, and simple read and write. Since spark syntax only handles simple read and write, I had to open SQL connection using Scala and perform DELETE and UPDATE queries.
Here's a sample scala code I use to execute delete queries:
val connection = DriverManager.getConnection(jdbcUrl, jdbcUsername, jdbcPassword)
val statement = connection.createStatement()
val queryStr = "DELETE FROM SAMPLE"
val res = stmt.execute(queryStr)
connection.close()
These lines work perfectly fine if I run one notebook at a time. However, when I run several notebooks in parallel, I can get into deadlock issues (see below)
How can I resolve this error?
01-25-2022 07:29 AM
the issue is not your code but the fact that you run the queries in parallel. The SQL server database cannot handle that for some reason.
f.e. one notebook run is doing an update while another wants to delete that record.
01-25-2022 05:39 AM
this is not a spark error but purely the database.
There are tons of articles online on how to prevent deadlocks, but there is no single solution for this.
01-25-2022 07:25 AM
I'm not a fluent Scala user. Do you happen to know one solution that deals with JDBC in Scala?
01-25-2022 07:29 AM
the issue is not your code but the fact that you run the queries in parallel. The SQL server database cannot handle that for some reason.
f.e. one notebook run is doing an update while another wants to delete that record.
01-25-2022 07:52 AM
Got it! Thank you so much! It looks like I can use error handling to rerun the deadlock victim until it works. Thanks for pointing me to the right direction!
10-20-2024 06:14 AM
You have manged time and stp step need person operation so deadlock can avoided. This is purly database proplem which can avoid in making time difference or short transcation in database operation.
10-21-2024 04:56 AM - edited 10-21-2024 04:58 AM
@swzzzsw
Since you are performing database operations, to reduce the chances of deadlocks, make sure to wrap your SQL operations inside transactions using commit and rollback.
Another approachs to consider is adding retry logic or using Isolation Levels. For more information, refer to the Databricks documentation on isolation levels ( Isolation Levels Documentation )
Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.
If there isn’t a group near you, start one and help create a community that brings people together.
Request a New Group