Speed Up JDBC Write from Databricks Notebook to MS...

berserkersap · ‎09-16-2023

Hello Everyone,

I have a use case where I need to write a delta table from DataBricks to a SQL Server Table using Pyspark/ python/ spark SQL .

The delta table I am writing contains around 3 million records and the SQL Server Table is neither partitioned nor indexed. It is taking around 8-10 min to write into the SQL Server Table no matter what compute I use (using default JDBC driver).

The code being used:

df.write.mode("overwrite").option("truncate", True ).jdbc(Url, "dbtable" , properties = properties )

I have executed this code on DataBricks runtime 11.3LTS using the compute E4ds v4 and E16ds v4 but in both the cases it took 8-10 min.

Can anyone please suggest a way to reduce this time ?

P.S. I have tried to increase the batch size but even that was not helping. If it was more than 20000 the process became even slower

Also, I cannot install anything on the cluster according to the client's requirement. But if this is absolutely necessary please let me know what to install and how to install.

I have tried using the driver "com.microsoft.sqlserver.jdbc.spark", however, it gave me an error

java.lang.ClassNotFoundException: Failed to find data source: com.microsoft.sqlserver.jdbc.spark.

Thank You

Speed Up JDBC Write from Databricks Notebook to MS SQL Server