I see that Delta Lake has an OPTIMIZE command and also table properties for Auto Optimize. What are the differences between these and when should I use one over the other?
I am running jobs on Databricks using the Run Submit API with Airflow. I have noticed that rarely, a particular run is run more than one time at once. Why?
Your checkpoint code looks correct.
What is the source of `df`? Is it `/Volumes/dev_catalog/default/streaming_basics/` ? The path looks incorrect - add `stream` to it.
When you swapped back to the old checkpoint, were any records flowing through, and were batches completing? It's possible that you've accumulated a big backlog with the old checkpoint, and/or records in Kafka have expired. And the "startingOffsets" o...
That error is usually related to driver load. Try upsizing the driver one size and see if it still happens.
Otherwise, for troubleshooting, driver problems are surfaced to the cluster's event log, like DRIVER_NOT_RESPONDING and DRIVER_UNAVAILABLE. Yo...
This looks like a misconfigured Query Watchdog, specifically the below config:
spark.conf.get("spark.databricks.queryWatchdog.outputRatioThreshold")
Please check the value of this config - it is 1000 by default. Also, we recommend using Jobs Comput...