Databricks Community

Phani1 · ‎03-01-2023

We are facing a performance issue while loading bulk data into Postgress DB from data bricks. We are using spark JDBC connections to move the data. However, the rate of transfer is very low which is causing performance bottleneck. is there any better approach to achieve this task?

daniel_sahal · ‎03-02-2023

@Janga Reddy

I remember that we had this kind of question before. Switching to another library partially solved the issue.

https://community.databricks.com/s/question/0D58Y00009ia8JpSAI/getting-error-while-loading-parquet-d...

Anonymous · ‎03-20-2023

Hi @Janga Reddy

Hope all is well! Just wanted to check in if you were able to resolve your issue and would you be happy to share the solution or mark an answer as best? Else please let us know if you need more help.

We'd love to hear from you.

Thanks!

User16502773013 · ‎03-29-2023

Hello @Janga Reddy @Daniel Sahal and @Vidula Khanna ,

To enhance performance in general we need to design for more parallelism, in Spark JDBC context this controlled by the number of partitions for the data to be written

The example here shows how to control parallelism while writing which is driven by numPartitions during read , while numPartitions is a Spark JDBC read parameter, the same can be done on a dataframe using repartition (documentation here)

It is worth mentioning that parallel reads/writes can put pressure on the RDBMS (Postgres in this case) meaning while Spark write can happen in parallel, the sizing/capacity/connectivity of the destination database should be taken into account and should be evaluated.

Regards

Databricks Community

Performance issue while loading bulk data into Post Gress DB from data bricks.

Join Us as a Local Community Builder!

🚀 Announcing the Databricks Data Intelligence Platform Cheat Sheet

Find Sensitive Data at Scale with Data Classification in Unity Catalog

Solution Accelerator Series | #6 - Adverse Drug Event Detection

Announcing Backfill Runs in Lakeflow Jobs for Higher Quality Downstream Data

🚀 New: Databricks Interactive Architecture Design Workshops