topic Re: Delta Lake Table Daily Read and Write job optimization in Data Engineering

Delta Lake Table Daily Read and Write job optimization

AH — Wed, 05 Jun 2024 07:26:58 GMT

I have created 7 job for each business system to extract product data from each postgress source then write all job data into one data lake delta table [raw_product].

each business system product table has around 20 GB of data.

do the same thing for 15 table .

is any way to read and write fast in delta tables

one job looks like the one below

daily day loaded into delta table by using merge command

Re: Delta Lake Table Daily Read and Write job optimization

shan_chandra — Wed, 05 Jun 2024 19:51:23 GMT

@AH - we can try out the config

if read or fetch from postgres is slow , we can increase the fetchsize , numPartitions (to increase parallelism). kindly try to do a df.count() to check on slowness.

https://spark.apache.org/docs/latest/sql-data-sources-jdbc.html

If the write is slow, kindly try to a write the data to a temp table first before merge to see if this is an issue due to merge.