- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
06-05-2024 12:26 AM
I have created 7 job for each business system to extract product data from each postgress source then write all job data into one data lake delta table [raw_product].
each business system product table has around 20 GB of data.
do the same thing for 15 table .
is any way to read and write fast in delta tables
one job looks like the one below
daily day loaded into delta table by using merge command
Accepted Solutions
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
06-05-2024 12:51 PM
@AH - we can try out the config
if read or fetch from postgres is slow , we can increase the fetchsize , numPartitions (to increase parallelism). kindly try to do a df.count() to check on slowness.
https://spark.apache.org/docs/latest/sql-data-sources-jdbc.html
If the write is slow, kindly try to a write the data to a temp table first before merge to see if this is an issue due to merge.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
06-05-2024 12:51 PM
@AH - we can try out the config
if read or fetch from postgres is slow , we can increase the fetchsize , numPartitions (to increase parallelism). kindly try to do a df.count() to check on slowness.
https://spark.apache.org/docs/latest/sql-data-sources-jdbc.html
If the write is slow, kindly try to a write the data to a temp table first before merge to see if this is an issue due to merge.

