Databricks Community

AH · ‎06-05-2024

I have created 7 job for each business system to extract product data from each postgress source then write all job data into one data lake delta table [raw_product].

each business system product table has around 20 GB of data.

do the same thing for 15 table .

is any way to read and write fast in delta tables

one job looks like the one below

daily day loaded into delta table by using merge command

shan_chandra · ‎06-05-2024

@AH - we can try out the config

if read or fetch from postgres is slow , we can increase the fetchsize , numPartitions (to increase parallelism). kindly try to do a df.count() to check on slowness.

https://spark.apache.org/docs/latest/sql-data-sources-jdbc.html

If the write is slow, kindly try to a write the data to a temp table first before merge to see if this is an issue due to merge.

View solution in original post

shan_chandra · ‎06-05-2024

@AH - we can try out the config

if read or fetch from postgres is slow , we can increase the fetchsize , numPartitions (to increase parallelism). kindly try to do a df.count() to check on slowness.

https://spark.apache.org/docs/latest/sql-data-sources-jdbc.html

If the write is slow, kindly try to a write the data to a temp table first before merge to see if this is an issue due to merge.