Options
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
01-31-2023 10:01 AM
@Kuldeep Chitrakar - Please try to evaluate(explain plan) the physical plan on the CTAS query before creating the table. Below are a few things that can be validated before turning the cluster size.
- validate the join conditions used in CTAS query.
- will a plain select query work?
- Tuning spark.sql.shuffle.partitions to see if more number of tasks are spun in parallel to reduce the time taken.
- Is there a skew in the join?
- will AQE config help? (https://docs.databricks.com/optimizations/aqe.html)