Databricks Community

ls · ‎01-13-2025

Howdy!
I wanted to know how I can change some spark configs in a Serverless compute. I have a base.yml file and tried placing:
spark_conf:
- spark.driver.maxResultSize: "16g"

but I still get his error:
[CONFIG_NOT_AVAILABLE] Configuration spark.driver.maxResultSize is not available. SQLSTATE: 42K0I

and trying to change a config within the notebook is not allowed either.

Walter_C · ‎01-14-2025

To address the memory issue in your Serverless compute environment, you can consider the following strategies:

Optimize the Query:
- Filter Early: Ensure that you are filtering the data as early as possible in your query to reduce the amount of data being processed. For example, if you can add more specific conditions to your WHERE clause, it will help in reducing the data size.
- Limit Columns: Select only the necessary columns instead of using SELECT *. This reduces the amount of data being transferred and processed.
Use Spark DataFrame Operations:
- Instead of converting the entire result to a Pandas DataFrame using toPandas(), try to perform as many operations as possible using Spark DataFrame operations. Spark DataFrames are distributed and can handle larger datasets more efficiently than Pandas DataFrames.
Use Delta Tables:
- If you are working with large datasets, consider using Delta tables. Delta tables provide optimized storage and query performance, which can help in managing memory usage more efficiently.

View solution in original post

Walter_C · ‎01-13-2025

Spark configs are limited in Serverless, this are the supported configs you can set https://docs.databricks.com/en/release-notes/serverless/index.html#supported-spark-configuration-par...

ls · ‎01-14-2025

Is there anything I can do to increase the memory? Or do you know of a way I could make it not run out of memory? Here is the code block:

dt = datetime.strptime(input_date, "%Y/%m/%d")
buffer_sec = 6

timestamp_start_ms = int((dt.replace(tzinfo=timezone.utc).timestamp() - buffer_sec) * 1000)
timestamp_end_ms = int((timestamp_start_ms + (24 * 3600 * 1000)) + buffer_sec * 2 * 1000)
interpolated_filtered = f"SELECT * FROM `catalog`.default.events \
WHERE timestamp >= {timestamp_start_ms} AND timestamp <= {timestamp_end_ms} ORDER BY timestamp ASC"
interpolated_df = spark.sql(interpolated_filtered).toPandas()

Walter_C · ‎01-14-2025

To address the memory issue in your Serverless compute environment, you can consider the following strategies:

Optimize the Query:
- Filter Early: Ensure that you are filtering the data as early as possible in your query to reduce the amount of data being processed. For example, if you can add more specific conditions to your WHERE clause, it will help in reducing the data size.
- Limit Columns: Select only the necessary columns instead of using SELECT *. This reduces the amount of data being transferred and processed.
Use Spark DataFrame Operations:
- Instead of converting the entire result to a Pandas DataFrame using toPandas(), try to perform as many operations as possible using Spark DataFrame operations. Spark DataFrames are distributed and can handle larger datasets more efficiently than Pandas DataFrames.
Use Delta Tables:
- If you are working with large datasets, consider using Delta tables. Delta tables provide optimized storage and query performance, which can help in managing memory usage more efficiently.

Databricks Community

Change spark configs in Serverless compute clusters

Join Us as a Local Community Builder!

PSA: Community Edition retires on January 1, 2026. Move to the Free Edition today to keep your work.

🎤 Call for Presentations: Data + AI Summit 2026 is Open!

Last Chance: Help Shape the 2026 Data + AI Summit | Win a Full Conference Pass

🌟 Community Pulse: Your Weekly Roundup! December 05 – 11, 2025

Celebrating Our First Brickster Champion: Louis Frolio