Hi everyone,
Weโre in the process of migrating from all-purpose clusters to serverless compute in Databricks. On our all-purpose clusters, weโve been setting specific Spark configurations (e.g., via the clusterโs advanced options). However, weโve noticed that serverless compute doesnโt expose the same โAdvanced Optionsโ UI for Spark config overrides.
Is there a recommended way to apply custom Spark configurations when using serverless compute? Ideally, weโd like to apply these settings at cluster start or job submission time.
Any guidance or best practices would be greatly appreciated!
Below are the configs we are setting:
Overwrite partitions:
- spark.sql.sources.partitionOverwriteMode dynamic
Remove default files written (_started, _SUCCESS, _committed):
- mapreduce.fileoutputcommitter.marksuccessfuljobs false
- parquet.enable.summary-metadata false
- spark.sql.sources.commitProtocolClass org.apache.spark.sql.execution.datasources.SQLHadoopMapReduceCommitProtocol
Thanks,
Mac