How to enable AQE in foreachbatch mode
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
12-10-2024 12:29 AM - edited 12-10-2024 12:33 AM
df.writeStream.format("delta") .option("checkpointLocation", "dbfs/loc") .foreachBatch(transform_and_upsert) .outputMode("update") .trigger(availableNow=True) .start()
Due to skewness I want to enable aqe and set skewJoin optimization true
spark.conf.set("spark.sql.adaptive.enabled", "true") spark.conf.set("spark.sql.adaptive.skewJoin.enabled", "true") spark.conf.set("spark.sql.adaptive.forceOptimizeSkewedJoin", "true")
However, when I checked the Spark UI settings, the value was set to false: spark.sql.adaptive.enabled = false.
I am using Databricks DBR 14.3.x-photon-scala2.12 with Photon enabled.
According to this https://www.databricks.com/blog/adaptive-query-execution-structured-streaming, AQE supports streaming for each batch query starting from DBR 13.2.
here is the settings in dataframe properties tab
- Labels:
-
Spark
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
12-10-2024 04:21 AM
@mjedy78
Did you set the config at cluster level or notebook level??
Can you try to set these config in cluster properties and check if that helps!
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
12-10-2024 05:00 AM - edited 12-10-2024 05:01 AM
I have tried both,
What I am triggering is a job, in job first I set within notebook level by adding spark.conf.set
Then I also added some configs in job cluster
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
12-10-2024 10:03 AM
@MuthuLakshmi any idea?

