Databricks Community

lizou1 · ‎07-03-2025

My general question:

Does a serverless compute job automatically scale?

The reason I try serverless job - with Performance optimization Disabled option is to make job run effortless and cost effective.

I don't like to do any tuning on spark at all. I did not use any special serverless job compute config and left all by default in azure databricks.

any suggestions?

I have a table that is normal sized, but there are other process running in same serverless compute, sometime job fail with this error

Failed to stage table A to B / Job aborted due to stage failure: java.lang.RuntimeException: During hash join, the build side was too large to fit in memory (1935662 rows, 765114406 bytes), and Photon was unable to partition it. Many rows likely share the same key. Try disabling Photon by adding set spark.databricks.photon.enabled=false; to your query.

lizou1 · ‎07-03-2025

more info: the table contains non utf-8 characters. Based on my experience, databricks tend to have issues with large amount text in one column or text has non utf-8 characters

SP_6721 · ‎07-04-2025

Hi @lizou1 ,

Databricks serverless compute jobs automatically scale based on workload, so there’s no need to manually configure clusters, Databricks manages that for you.

However, autoscaling doesn’t always resolve issues caused by data skew or shared resource limits. Try disabling Photon if many rows share the same key, and consider breaking large joins into smaller steps or optimizing your data layout.

Also, ensure your data is UTF-8 encoded before loading it into Databricks for the best results.

Sharanya13 · ‎07-04-2025

The table is ~750MB, which isn't super big. Did the job work with classic compute?

lizou1 · ‎07-06-2025

serverless is used, I think non udf-8 characters in data cause processing string inefficient. I will try to clean up and remove non udf-8 and will find out if that's the cause or not.

lizou1 · ‎07-11-2025

I found a setting about 16 G vs 32 g, but that's is part of memory used by spark

https://learn.microsoft.com/en-us/azure/databricks/compute/serverless/dependencies#high-memory

If you run into out-of-memory errors in your notebook, you can configure the notebook to use a higher memory size. This setting increases the size of the REPL memory used when running code in the notebook. It does not affect the memory size of the Spark session.

Databricks Community

serverless job compute error

Join Us as a Local Community Builder!

🌟 Community Pulse: Your Weekly Roundup! November 14 – 20, 2025

Celebrating Our First Brickster Champion: Louis Frolio

⭐ Setup Spark with Hadoop Anywhere : A DBR aligned local Spark+HDFS+Hive stack on Docker⭐

Big Book of Data Engineering - Get how-tos, code snippets and real-world examples

Portland Data + AI Meetup — Holiday Event - Wednesday, December 3rd