cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Out of Memory/Connection Lost When Writing to External SQL Server from Databricks Using JDBC Connection

Megan05
New Contributor III

I am working on writing a large amount of data from Databricks to an external SQL server using a JDB connection. I keep getting timeout errors/connection lost but digging deeper it appears to be a memory problem. I am wondering what cluster configurations I may need/where would be best to cache my data. The input data is about about 60 gb of data that is reduced to 60 mil rows. The process works to write about 1 million rows to the external database then crashes.

I have tried different cluster configurations, memory optimized, compute optimized etc. I have also tried different garbage collection settings as the garbage collection metric is dark red during the process.

1 ACCEPTED SOLUTION

Accepted Solutions

Hubert-Dudek
Esteemed Contributor III

Please extend the number of dataframe partitions using  

coalesce(<N>) or repartition(<N>). In most cases, it should save the issue automatically as it will write in chunks per partition.

In addition these jdbc connection properties can help (as on JDBC To Other Databases - Spark 3.3.0 Documentation (apache.org)😞

numPartitions

batchsize

isolationLevel

View solution in original post

4 REPLIES 4

Hubert-Dudek
Esteemed Contributor III

Please extend the number of dataframe partitions using  

coalesce(<N>) or repartition(<N>). In most cases, it should save the issue automatically as it will write in chunks per partition.

In addition these jdbc connection properties can help (as on JDBC To Other Databases - Spark 3.3.0 Documentation (apache.org)😞

numPartitions

batchsize

isolationLevel

Megan05
New Contributor III

Thanks for your response, Hubert! That seemed to work to fix the timeout issue.

Hubert-Dudek
Esteemed Contributor III

Great to hear. If it is possible, please select my answer as the best one.

hotrabattecom
New Contributor II

Thanks for the answer. I am also get in this problem.

Hotrabatt

Join 100K+ Data Experts: Register Now & Grow with Us!

Excited to expand your horizons with us? Click here to Register and begin your journey to success!

Already a member? Login and join your local regional user group! If there isn’t one near you, fill out this form and we’ll create one for you to join!