cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Change spark configs in Serverless compute clusters

ls
New Contributor III

Howdy!
I wanted to know how I can change some spark configs in a Serverless compute. I have a base.yml file and tried placing: 
spark_conf:
     - spark.driver.maxResultSize: "16g"

but I still get his error:
[
CONFIG_NOT_AVAILABLE] Configuration spark.driver.maxResultSize is not available. SQLSTATE: 42K0I

and trying to change a config within the notebook is not allowed either.

1 ACCEPTED SOLUTION

Accepted Solutions

Walter_C
Databricks Employee
Databricks Employee

To address the memory issue in your Serverless compute environment, you can consider the following strategies:

  1. Optimize the Query:

    • Filter Early: Ensure that you are filtering the data as early as possible in your query to reduce the amount of data being processed. For example, if you can add more specific conditions to your WHERE clause, it will help in reducing the data size.
    • Limit Columns: Select only the necessary columns instead of using SELECT *. This reduces the amount of data being transferred and processed.
  2. Use Spark DataFrame Operations:

    • Instead of converting the entire result to a Pandas DataFrame using toPandas(), try to perform as many operations as possible using Spark DataFrame operations. Spark DataFrames are distributed and can handle larger datasets more efficiently than Pandas DataFrames.
  3. Use Delta Tables:

    • If you are working with large datasets, consider using Delta tables. Delta tables provide optimized storage and query performance, which can help in managing memory usage more efficiently.

View solution in original post

3 REPLIES 3

Walter_C
Databricks Employee
Databricks Employee

Spark configs are limited in Serverless, this are the supported configs you can set https://docs.databricks.com/en/release-notes/serverless/index.html#supported-spark-configuration-par... 

ls
New Contributor III

Is there anything I can do to increase the memory? Or do you know of a way I could make it not run out of memory? Here is the code block:

dt = datetime.strptime(input_date, "%Y/%m/%d")
buffer_sec = 6

timestamp_start_ms = int((dt.replace(tzinfo=timezone.utc).timestamp() - buffer_sec) * 1000)
timestamp_end_ms = int((timestamp_start_ms + (24 * 3600 * 1000)) + buffer_sec * 2 * 1000)
interpolated_filtered = f"SELECT * FROM `catalog`.default.events \
WHERE timestamp >= {timestamp_start_ms} AND timestamp <= {timestamp_end_ms} ORDER BY timestamp ASC"
interpolated_df = spark.sql(interpolated_filtered).toPandas()

Walter_C
Databricks Employee
Databricks Employee

To address the memory issue in your Serverless compute environment, you can consider the following strategies:

  1. Optimize the Query:

    • Filter Early: Ensure that you are filtering the data as early as possible in your query to reduce the amount of data being processed. For example, if you can add more specific conditions to your WHERE clause, it will help in reducing the data size.
    • Limit Columns: Select only the necessary columns instead of using SELECT *. This reduces the amount of data being transferred and processed.
  2. Use Spark DataFrame Operations:

    • Instead of converting the entire result to a Pandas DataFrame using toPandas(), try to perform as many operations as possible using Spark DataFrame operations. Spark DataFrames are distributed and can handle larger datasets more efficiently than Pandas DataFrames.
  3. Use Delta Tables:

    • If you are working with large datasets, consider using Delta tables. Delta tables provide optimized storage and query performance, which can help in managing memory usage more efficiently.

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group