Re: Performance Issue : Create DELTA table form 2 ...

shan_chandra · ‎01-31-2023

@Kuldeep Chitrakar - Please try to evaluate(explain plan) the physical plan on the CTAS query before creating the table. Below are a few things that can be validated before turning the cluster size.

validate the join conditions used in CTAS query.
will a plain select query work?
Tuning spark.sql.shuffle.partitions to see if more number of tasks are spun in parallel to reduce the time taken.
Is there a skew in the join?
will AQE config help? (https://docs.databricks.com/optimizations/aqe.html)