Hi everyone,
We’re in the process of migrating from all-purpose clusters to serverless compute in Databricks. On our all-purpose clusters, we’ve been setting specific Spark configurations (e.g., via the cluster’s advanced options). However, we’ve noticed that serverless compute doesn’t expose the same “Advanced Options” UI for Spark config overrides.
Is there a recommended way to apply custom Spark configurations when using serverless compute? Ideally, we’d like to apply these settings at cluster start or job submission time.
Any guidance or best practices would be greatly appreciated!
Below are the configs we are setting:
Overwrite partitions:
- spark.sql.sources.partitionOverwriteMode dynamic
Remove default files written (_started, _SUCCESS, _committed):
- mapreduce.fileoutputcommitter.marksuccessfuljobs false
- parquet.enable.summary-metadata false
- spark.sql.sources.commitProtocolClass org.apache.spark.sql.execution.datasources.SQLHadoopMapReduceCommitProtocol
Thanks,
Mac