Hi,
I need performance improvement for data bricks job in my project. Here are some steps being done in the project
1. Read csv/Json files with small size (100MB,50MB) from multiple locations in s3
2. Write the data in bronze layer in delta/parquet format
3. Read from bronze layer
4. Do some filter for data cleaning
5. Write to silver in delta/parquet format
6. Read from silver layer
7. Do lot of joins and other transformations like union, distinct
8. Write the final data to AWS RDS
I'm not getting enough performance improvement. for 5KB data it is taking almost 1min 30 sec
Also, I observed that enough parallelism is not there, and all cores are not getting utilized (I have 4 cores)
Please give some suggestions on this