cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

serverless job compute error

lizou1
New Contributor III

My general question:

Does a serverless compute job automatically scale?

The reason I try serverless job - with Performance optimization Disabled option is to make job run effortless and cost effective.

I don't like to do any tuning on spark at all. I did not use any special serverless job compute config and left all by default in azure databricks.

any suggestions? 

I have a table that is normal sized, but there are other process running in same serverless compute, sometime job fail with this error

Failed to stage table A to B / Job aborted due to stage failure: java.lang.RuntimeException: During hash join, the build side was too large to fit in memory (1935662 rows, 765114406 bytes), and Photon was unable to partition it. Many rows likely share the same key. Try disabling Photon by adding set spark.databricks.photon.enabled=false; to your query.

lizou1_0-1751579071982.png

 

 

5 REPLIES 5

lizou1
New Contributor III

more info: the table contains non utf-8 characters. Based on my experience, databricks tend to have issues with large amount text in one column or text has non utf-8 characters

SP_6721
Contributor III

Hi @lizou1 ,

Databricks serverless compute jobs automatically scale based on workload, so there’s no need to manually configure clusters, Databricks manages that for you.

However, autoscaling doesn’t always resolve issues caused by data skew or shared resource limits. Try disabling Photon if many rows share the same key, and consider breaking large joins into smaller steps or optimizing your data layout.

Also, ensure your data is UTF-8 encoded before loading it into Databricks for the best results.

Sharanya13
Contributor III

The table is ~750MB, which isn't super big. Did the job work with classic compute?

lizou1
New Contributor III

serverless is used, I think non udf-8 characters in data cause processing string inefficient. I will try to clean up and remove non udf-8 and will find out if that's  the cause or not.

lizou1
New Contributor III

I found a setting about 16 G vs 32 g, but that's is part of memory used by spark

https://learn.microsoft.com/en-us/azure/databricks/compute/serverless/dependencies#high-memory

If you run into out-of-memory errors in your notebook, you can configure the notebook to use a higher memory size. This setting increases the size of the REPL memory used when running code in the notebook. It does not affect the memory size of the Spark session.