cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Add Spark Configurations Serverless Compute

mac_delvalle
New Contributor II

Hi everyone,

We’re in the process of migrating from all-purpose clusters to serverless compute in Databricks. On our all-purpose clusters, we’ve been setting specific Spark configurations (e.g., via the cluster’s advanced options). However, we’ve noticed that serverless compute doesn’t expose the same “Advanced Options” UI for Spark config overrides.

Is there a recommended way to apply custom Spark configurations when using serverless compute? Ideally, we’d like to apply these settings at cluster start or job submission time.

Any guidance or best practices would be greatly appreciated!

Below are the configs we are setting:

Overwrite partitions: 

  • spark.sql.sources.partitionOverwriteMode dynamic

Remove default files written (_started, _SUCCESS, _committed):

  • mapreduce.fileoutputcommitter.marksuccessfuljobs false
  • parquet.enable.summary-metadata false
  • spark.sql.sources.commitProtocolClass org.apache.spark.sql.execution.datasources.SQLHadoopMapReduceCommitProtocol

Thanks,
Mac

 

1 ACCEPTED SOLUTION

Accepted Solutions

szymon_dybczak
Esteemed Contributor III

Hi @mac_delvalle ,

I'm afraid that when it comes to serverless compute your options are kind of limited.Severless compute does not support setting most Spark properties for notebooks or job. According to following documentation entry: "Severless compute does not support setting most Spark properties for notebooks or job"

 

https://learn.microsoft.com/en-us/azure/databricks/spark/conf#configure-spark-properties-for-serverl...

View solution in original post

4 REPLIES 4

szymon_dybczak
Esteemed Contributor III

Hi @mac_delvalle ,

I'm afraid that when it comes to serverless compute your options are kind of limited.Severless compute does not support setting most Spark properties for notebooks or job. According to following documentation entry: "Severless compute does not support setting most Spark properties for notebooks or job"

 

https://learn.microsoft.com/en-us/azure/databricks/spark/conf#configure-spark-properties-for-serverl...

Thank you for your reply. Hopefully they add this functionality in the future.

szymon_dybczak
Esteemed Contributor III

Hi @mac_delvalle ,

If any of the answers were helpful to you, consider marking it as an accepted solution. This way, the next person with a similar question will be able to find the solution to their problem more quickly.

nayan_wylde
Honored Contributor

I think you will not be able to set spark configurations in cluster in serverless. But you can put this in notebook.

spark.conf.set(
  "spark.sql.sources.partitionOverwriteMode", "dynamic"
)