ā07-03-2025 02:48 PM
My general question:
Does a serverless compute job automatically scale?
The reason I try serverless job - with Performance optimization Disabled option is to make job run effortless and cost effective.
I don't like to do any tuning on spark at all. I did not use any special serverless job compute config and left all by default in azure databricks.
any suggestions?
I have a table that is normal sized, but there are other process running in same serverless compute, sometime job fail with this error
Failed to stage table A to B / Job aborted due to stage failure: java.lang.RuntimeException: During hash join, the build side was too large to fit in memory (1935662 rows, 765114406 bytes), and Photon was unable to partition it. Many rows likely share the same key. Try disabling Photon by adding set spark.databricks.photon.enabled=false; to your query.
ā07-03-2025 03:21 PM
more info: the table contains non utf-8 characters. Based on my experience, databricks tend to have issues with large amount text in one column or text has non utf-8 characters
ā07-04-2025 07:21 AM
Hi @lizou1 ,
Databricks serverless compute jobs automatically scale based on workload, so thereās no need to manually configure clusters, Databricks manages that for you.
However, autoscaling doesnāt always resolve issues caused by data skew or shared resource limits. Try disabling Photon if many rows share the same key, and consider breaking large joins into smaller steps or optimizing your data layout.
Also, ensure your data is UTF-8 encoded before loading it into Databricks for the best results.
ā07-04-2025 05:17 PM
The table is ~750MB, which isn't super big. Did the job work with classic compute?
ā07-06-2025 05:37 PM
serverless is used, I think non udf-8 characters in data cause processing string inefficient. I will try to clean up and remove non udf-8 and will find out if that's the cause or not.
ā07-11-2025 05:22 AM
I found a setting about 16 G vs 32 g, but that's is part of memory used by spark
https://learn.microsoft.com/en-us/azure/databricks/compute/serverless/dependencies#high-memory
If you run into out-of-memory errors in your notebook, you can configure the notebook to use a higher memory size. This setting increases the size of the REPL memory used when running code in the notebook. It does not affect the memory size of the Spark session.
Passionate about hosting events and connecting people? Help us grow a vibrant local communityāsign up today to get started!
Sign Up Now