I see that Delta Lake has an OPTIMIZE command and also table properties for Auto Optimize. What are the differences between these and when should I use one over the other?
I am running jobs on Databricks using the Run Submit API with Airflow. I have noticed that rarely, a particular run is run more than one time at once. Why?
That error is usually related to driver load. Try upsizing the driver one size and see if it still happens.
Otherwise, for troubleshooting, driver problems are surfaced to the cluster's event log, like DRIVER_NOT_RESPONDING and DRIVER_UNAVAILABLE. Yo...
This looks like a misconfigured Query Watchdog, specifically the below config:
spark.conf.get("spark.databricks.queryWatchdog.outputRatioThreshold")
Please check the value of this config - it is 1000 by default. Also, we recommend using Jobs Comput...
I don't have full access to that article, but here's something that might help clarify things!
While Spark uses lazy evaluation (meaning it waits to execute until absolutely necessary), Python works with eager evaluation. This means that when you ru...