How to enable AQE in foreachbatch mode
Options
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
12-10-2024 12:29 AM - edited 12-10-2024 12:33 AM
I am processing the daily data from checkpoint to checkpoint everyday by using for each batch in streaming way.
df.writeStream.format("delta") .option("checkpointLocation", "dbfs/loc") .foreachBatch(transform_and_upsert) .outputMode("update") .trigger(availableNow=True) .start()
Due to skewness I want to enable aqe and set skewJoin optimization true
spark.conf.set("spark.sql.adaptive.enabled", "true") spark.conf.set("spark.sql.adaptive.skewJoin.enabled", "true") spark.conf.set("spark.sql.adaptive.forceOptimizeSkewedJoin", "true")
However, when I checked the Spark UI settings, the value was set to false: spark.sql.adaptive.enabled = false.
I am using Databricks DBR 14.3.x-photon-scala2.12 with Photon enabled.
According to this https://www.databricks.com/blog/adaptive-query-execution-structured-streaming, AQE supports streaming for each batch query starting from DBR 13.2.
here is the settings in dataframe properties tab