Databricks Community

mjedy78 · ‎12-10-2024

I am processing the daily data from checkpoint to checkpoint everyday by using for each batch in streaming way.

df.writeStream.format("delta")
        .option("checkpointLocation", "dbfs/loc")
        .foreachBatch(transform_and_upsert)
        .outputMode("update")
        .trigger(availableNow=True)
        .start()

Due to skewness I want to enable aqe and set skewJoin optimization true

spark.conf.set("spark.sql.adaptive.enabled", "true")
spark.conf.set("spark.sql.adaptive.skewJoin.enabled", "true")
spark.conf.set("spark.sql.adaptive.forceOptimizeSkewedJoin", "true")

However, when I checked the Spark UI settings, the value was set to false: spark.sql.adaptive.enabled = false.

I am using Databricks DBR 14.3.x-photon-scala2.12 with Photon enabled.

According to this https://www.databricks.com/blog/adaptive-query-execution-structured-streaming, AQE supports streaming for each batch query starting from DBR 13.2.

here is the settings in dataframe properties tab

MuthuLakshmi · ‎12-10-2024

@mjedy78
Did you set the config at cluster level or notebook level??
Can you try to set these config in cluster properties and check if that helps!

mjedy78 · ‎12-10-2024

I have tried both,
What I am triggering is a job, in job first I set within notebook level by adding spark.conf.set
Then I also added some configs in job cluster