cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Performance issue while loading bulk data into Post Gress DB from data bricks.

Phani1
Valued Contributor II

We are facing a performance issue while loading bulk data into Postgress DB from data bricks. We are using spark JDBC connections to move the data. However, the rate of transfer is very low which is causing performance bottleneck. is there any better approach to achieve this task?

3 REPLIES 3

daniel_sahal
Esteemed Contributor

@Janga Reddy​ 

I remember that we had this kind of question before. Switching to another library partially solved the issue.

https://community.databricks.com/s/question/0D58Y00009ia8JpSAI/getting-error-while-loading-parquet-d...

Anonymous
Not applicable

Hi @Janga Reddy​ 

Hope all is well! Just wanted to check in if you were able to resolve your issue and would you be happy to share the solution or mark an answer as best? Else please let us know if you need more help. 

We'd love to hear from you.

Thanks!

User16502773013
Databricks Employee
Databricks Employee

Hello @Janga Reddy​ @Daniel Sahal​ and @Vidula Khanna​ ,

To enhance performance in general we need to design for more parallelism, in Spark JDBC context this controlled by the number of partitions for the data to be written

The example here shows how to control parallelism while writing which is driven by numPartitions during read , while numPartitions is a Spark JDBC read parameter, the same can be done on a dataframe using repartition (documentation here)

It is worth mentioning that parallel reads/writes can put pressure on the RDBMS (Postgres in this case) meaning while Spark write can happen in parallel, the sizing/capacity/connectivity of the destination database should be taken into account and should be evaluated.

Regards

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group