shan_chandra
Databricks Employee
Databricks Employee

@Kuldeep Chitrakar​ - Please try to evaluate(explain plan) the physical plan on the CTAS query before creating the table. Below are a few things that can be validated before turning the cluster size.

  1. validate the join conditions used in CTAS query.
  2. will a plain select query work?
  3. Tuning spark.sql.shuffle.partitions to see if more number of tasks are spun in parallel to reduce the time taken.
  4. Is there a skew in the join?
  5. will AQE config help? (https://docs.databricks.com/optimizations/aqe.html)