Hello Everyone,
I have a use case where I need to write a delta table from DataBricks to a SQL Server Table using Pyspark/ python/ spark SQL .
The delta table I am writing contains around 3 million records and the SQL Server Table is neither partitioned nor indexed. It is taking around 8-10 min to write into the SQL Server Table no matter what compute I use (using default JDBC driver).
The code being used:
df.write.mode("overwrite").option("truncate", True ).jdbc(Url, "dbtable" , properties = properties )
I have executed this code on DataBricks runtime 11.3LTS using the compute E4ds v4 and E16ds v4 but in both the cases it took 8-10 min.
Can anyone please suggest a way to reduce this time ?
P.S. I have tried to increase the batch size but even that was not helping. If it was more than 20000 the process became even slower
Also, I cannot install anything on the cluster according to the client's requirement. But if this is absolutely necessary please let me know what to install and how to install.
I have tried using the driver "com.microsoft.sqlserver.jdbc.spark", however, it gave me an error
java.lang.ClassNotFoundException: Failed to find data source: com.microsoft.sqlserver.jdbc.spark.
Thank You