Azure Databricks Jobs failed intermittently due to connection timeout (Read Failed) while executing a MS SQL stored procedure which is in Azure SQL database.
My requirement is to process delta records(Get delta records using last refresh date) from DataLake(We don not have access to DataLake. It is currently a black box for us. We have only granted to mount it and query PARQUET files) into a staging table in MS SQL database, then run SQL merge logic through stored procedure. Some millions of records will get updated through merge logic for every job run, so it will take around 10 mins to finish the SPROC execution. Sometimes it completes successfully, sometimes it get failed due to connection time out.
There are approximately 20 datasets processing from data lake to MS SQL, single notebook is dedicated to each dataset.
I am using below JDBC logic to execute the SPROC. Databricks Jobs are using high configured clusters of 144GB,72Core driver and 6 workers same as driver.
val props = new Properties()
props.put("encryptionmethod","SSL")
props.put("accesstoken", accessToken)
val conn = DriverManager.getConnection(jdbcurl, props)
try{
conn.createStatement.execute("EXEC [stg].[usp_MergePurchaseOrder]")
}
catch{
case ex:Exception => throw ex
}
finally{
conn.close()
}
This is blocking us to move to production. Please help me. My whole team is new to Databricks concepts.