LarsMewa
New Contributor III

This fixed it:

As a quick workaround to address out-of-memory errors when processing large JSON files in Databricks serverless pipelines, we recommend disabling the Photon JSON Scan. The Photon engine is optimized for performance, but scanning large JSON files with it can use up to 7x the raw file size in memory.

Try to disable Photon JSON Scan by adding this configuration to your pipeline or notebook:

 
set spark.databricks.photon.jsonScan.enabled=false
 
This forces the engine to use the standard Spark JSON reader, which uses far less memory. There may be a slight performance trade-off, but it is the most reliable solution for large JSON datasets in serverless workflows.

View solution in original post