Performance issue while loading bulk data into Post Gress DB from data bricks.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
03-01-2023 09:40 PM
We are facing a performance issue while loading bulk data into Postgress DB from data bricks. We are using spark JDBC connections to move the data. However, the rate of transfer is very low which is causing performance bottleneck. is there any better approach to achieve this task?
- Labels:
-
databricks
-
Performance Issue
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
03-02-2023 10:41 PM
@Janga Reddy
I remember that we had this kind of question before. Switching to another library partially solved the issue.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
03-20-2023 11:57 PM
Hi @Janga Reddy
Hope all is well! Just wanted to check in if you were able to resolve your issue and would you be happy to share the solution or mark an answer as best? Else please let us know if you need more help.
We'd love to hear from you.
Thanks!
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
03-29-2023 07:30 PM
Hello @Janga Reddy @Daniel Sahal and @Vidula Khanna ,
To enhance performance in general we need to design for more parallelism, in Spark JDBC context this controlled by the number of partitions for the data to be written
The example here shows how to control parallelism while writing which is driven by numPartitions during read , while numPartitions is a Spark JDBC read parameter, the same can be done on a dataframe using repartition (documentation here)
It is worth mentioning that parallel reads/writes can put pressure on the RDBMS (Postgres in this case) meaning while Spark write can happen in parallel, the sizing/capacity/connectivity of the destination database should be taken into account and should be evaluated.
Regards

