I see that Delta Lake has an OPTIMIZE command and also table properties for Auto Optimize. What are the differences between these and when should I use one over the other?
I am running jobs on Databricks using the Run Submit API with Airflow. I have noticed that rarely, a particular run is run more than one time at once. Why?
We do not recommend using spot instances with distributed ML training workloads that use barrier mode, like TorchDistributor as these workloads are extremely sensitive to executor loss. Please disable spot/pre-emption and try again.
We have customers that read millions of files per hour+ using Databricks Auto Loader. For high-volume use cases, we recommend enabling file notification mode, which, instead of continuously performing list operations on the filesystem, uses cloud nat...
In the Linked Services panel in the Azure Data Factory UI, find your Databricks Linked Service and click the squiggly braces {} to edit the service.
From here, you'll see a definition in JSON. Create a new key: properties.typeProperties.policyId and ...
If you're using dedicated compute, please be aware that
Self-joins are blocked by default when data filtering is called, but you can allow them by setting spark.databricks.remoteFiltering.blockSelfJoins to false on compute you are running these comma...