<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Databricks Jobs &amp;amp; Pipelines: Serverless SparkOutOfMemoryError while reading 500mb json file in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/databricks-jobs-amp-pipelines-serverless-sparkoutofmemoryerror/m-p/132854#M49652</link>
    <description>&lt;P&gt;Hi&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/111443"&gt;@LarsMewa&lt;/a&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;When using serverless, we will not be able to upgrade the size of the executor and the driver. Are you facing issue when processing the json and the csv file together?&amp;nbsp;&lt;/P&gt;
&lt;P&gt;The error indicates&amp;nbsp;&lt;STRONG data-start="286" data-end="320"&gt;serverless resource constraint&lt;/STRONG&gt;: serverless use limited off-heap memory for Photon. When you load &lt;EM data-start="429" data-end="435"&gt;both&lt;/EM&gt; CSV + JSON, Photon has less room left to allocate its big parsing buffer.&lt;/P&gt;
&lt;P&gt;Kindly let me know if you have any questions on this&lt;/P&gt;</description>
    <pubDate>Tue, 23 Sep 2025 14:45:59 GMT</pubDate>
    <dc:creator>Saritha_S</dc:creator>
    <dc:date>2025-09-23T14:45:59Z</dc:date>
    <item>
      <title>Databricks Jobs &amp; Pipelines: Serverless SparkOutOfMemoryError while reading 500mb json file</title>
      <link>https://community.databricks.com/t5/data-engineering/databricks-jobs-amp-pipelines-serverless-sparkoutofmemoryerror/m-p/132831#M49644</link>
      <description>&lt;P&gt;I'm getting the following SparkOutOfMemoryError message while reading a 500mb json file, see below. I'm loading four csv files (around 150mb per file) and the json file in the same pipeline. When I load the json file alone it reads fine, same when I load everything with a cluster.&amp;nbsp;&lt;/P&gt;&lt;P&gt;Anyone has an idea how to tweak serverless to read the json while processing the csv files?&lt;/P&gt;&lt;P&gt;Job aborted due to stage failure: Task 0 in stage 153.0 failed 4 times, most recent failure: Lost task 0.3 in stage 153.0 (TID 551) (10.46.122.241 executor 0): org.apache.spark.memory.SparkOutOfMemoryError: Photon ran out of memory while executing this query.&lt;BR /&gt;Photon failed to reserve 768.0 MiB for simdjson internal usage, in SimdJsonReader, in JsonFileScanNode(id=8883, output_schema=[string, string, string, string, string, string, string, string, string, string, string, string, string, string, string, string, string, string, string, string, string, string, string, string, string, ... 3 more]), in task.&lt;BR /&gt;Memory usage:&lt;BR /&gt;Total task memory (including non-Photon): 1152.0 MiB&lt;BR /&gt;task: allocated 262.1 MiB, tracked 1152.0 MiB, untracked allocated 0.0 B, peak 1152.0 MiB&lt;BR /&gt;BufferPool: allocated 6.1 MiB, tracked 128.0 MiB, untracked allocated 0.0 B, peak 128.0 MiB&lt;BR /&gt;DataWriter: allocated 0.0 B, tracked 0.0 B, untracked allocated 0.0 B, peak 0.0 B&lt;BR /&gt;Photon Protobuf Plan Arena: allocated 0.0 B, tracked 0.0 B, untracked allocated 0.0 B, peak 110.8 KiB&lt;BR /&gt;JsonFileScanNode(id=8883, output_schema=[string, string, string, string, string, string, string, string, string, string, string, string, string, string, string, string, string, string, string, string, string, string, string, string, string, ... 3 more]): allocated 256.0 MiB, tracked 1024.0 MiB, untracked allocated 0.0 B, peak 1024.0 MiB&lt;BR /&gt;JniReader: allocated 1984.0 B, tracked 1984.0 B, untracked allocated 0.0 B, peak 1984.0 B&lt;BR /&gt;SimdJsonReader: allocated 256.0 MiB, tracked 1024.0 MiB, untracked allocated 0.0 B, peak 1024.0 MiB&lt;BR /&gt;JSON buffer: allocated 256.0 MiB, tracked 256.0 MiB, untracked allocated 0.0 B, peak 256.0 MiB&lt;BR /&gt;simdjson internal usage: allocated 0.0 B, tracked 768.0 MiB, untracked allocated 0.0 B, peak 768.0 MiB&lt;BR /&gt;ProjectNode(id=8893, output_schema=[string, string, string, string, string, string, string, string, string, string, string, string, string, string, string, string, string, string, string, string, string, string, string, string, string, ... 3 more]): allocated 0.0 B, tracked 0.0 B, untracked allocated 0.0 B, peak 0.0 B&lt;BR /&gt;ProjectNode(id=8908, output_schema=[string, struct&amp;lt;string, string, string, string, string, string, string, string, string, string, string, string, string, string, string, string, string, string, string, string, string, string, string, string, string, ... 4 more&amp;gt;]): allocated 0.0 B, tracked 0.0 B, untracked allocated 0.0 B, peak 0.0 B&lt;BR /&gt;SortNode(id=8911, output_schema=[string, struct&amp;lt;string, string, string, string, string, string, string, string, string, string, string, string, string, string, string, string, string, string, string, string, string, string, string, string, string, ... 4 more&amp;gt;]): allocated 0.0 B, tracked 0.0 B, untracked allocated 0.0 B, peak 0.0 B&lt;BR /&gt;Sorter: allocated 0.0 B, tracked 0.0 B, untracked allocated 0.0 B, peak 0.0 B&lt;BR /&gt;spilled run buffers: allocated 0.0 B, tracked 0.0 B, untracked allocated 0.0 B, peak 0.0 B&lt;BR /&gt;output batch var len data: allocated 0.0 B, tracked 0.0 B, untracked allocated 0.0 B, peak 0.0 B&lt;BR /&gt;Memory consumers:&lt;BR /&gt;Acquired by com.databricks.photon.NativeMemoryConsumer@9cc6126: 1152.0 MiB&lt;/P&gt;&lt;P&gt;at 0xbca6493 &amp;lt;photon&amp;gt;.CreateReservationError(external/workspace_spark_3_5/photon/common/memory-tracker.cc:561)&lt;BR /&gt;at 0xbca51c7 &amp;lt;photon&amp;gt;.GrowBuffer(external/workspace_spark_3_5/photon/io/json/simd-json-reader.cc:295)&lt;BR /&gt;at 0x77b6d5f &amp;lt;photon&amp;gt;.TryLoadDocumentsFromStream(external/workspace_spark_3_5/photon/io/json/simd-json-reader.cc:313)&lt;BR /&gt;at 0x77b70e3 &amp;lt;photon&amp;gt;.HasNext(external/workspace_spark_3_5/photon/io/json/simd-json-reader.cc:365)&lt;BR /&gt;at 0x6e5444b &amp;lt;photon&amp;gt;.ReaderHasNext(external/workspace_spark_3_5/photon/exec-nodes/common-file-scan-node.h:139)&lt;BR /&gt;at 0x6e5405b &amp;lt;photon&amp;gt;.HasNextImpl(external/workspace_spark_3_5/photon/exec-nodes/json-file-scan-node.cc:121)&lt;BR /&gt;at 0x6d7c5e7 &amp;lt;photon&amp;gt;.OpenImpl(external/workspace_spark_3_5/photon/exec-nodes/sort-node.cc:140)&lt;BR /&gt;at com.databricks.photon.JniApiImpl.open(Native Method)&lt;BR /&gt;at com.databricks.photon.JniApi.open(JniApi.scala)&lt;BR /&gt;at com.databricks.photon.JniExecNode.open(JniExecNode.java:73)&lt;BR /&gt;at com.databricks.photon.PhotonColumnarBatchResultHandler.$anonfun$getResult$4(PhotonExec.scala:1224)&lt;BR /&gt;at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)&lt;BR /&gt;at com.databricks.photon.PhotonResultHandler.timeit(PhotonResultHandler.scala:30)&lt;BR /&gt;at com.databricks.photon.PhotonResultHandler.timeit$(PhotonResultHandler.scala:28)&lt;BR /&gt;at com.databricks.photon.PhotonColumnarBatchResultHandler.timeit(PhotonExec.scala:1216)&lt;BR /&gt;at com.databricks.photon.PhotonColumnarBatchResultHandler.getResult(PhotonExec.scala:1224)&lt;BR /&gt;at com.databricks.photon.PhotonBasicEvaluatorFactory$PhotonBasicEvaluator$$anon$1.open(PhotonBasicEvaluatorFactory.scala:252)&lt;BR /&gt;at com.databricks.photon.PhotonBasicEvaluatorFactory$PhotonBasicEvaluator$$anon$1.hasNextImpl(PhotonBasicEvaluatorFactory.scala:257)&lt;BR /&gt;at com.databricks.photon.PhotonBasicEvaluatorFactory$PhotonBasicEvaluator$$anon$1.$anonfun$hasNext$1(PhotonBasicEvaluatorFactory.scala:275)&lt;BR /&gt;at scala.runtime.java8.JFunction0$mcZ$sp.apply(JFunction0$mcZ$sp.java:23)&lt;BR /&gt;at com.databricks.photon.metrics.BillableTimeTaskMetrics.withPhotonBilling(BillableTimeTaskMetrics.scala:71)&lt;BR /&gt;at org.apache.spark.TaskContext.runFuncAsBillable(TaskContext.scala:267)&lt;BR /&gt;at com.databricks.photon.PhotonBasicEvaluatorFactory$PhotonBasicEvaluator$$anon$1.hasNext(PhotonBasicEvaluatorFactory.scala:275)&lt;BR /&gt;at com.databricks.photon.CloseableIterator$$anon$10.hasNext(CloseableIterator.scala:211)&lt;BR /&gt;at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.columnartorow_nextBatch_0$(Unknown Source)&lt;BR /&gt;at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown Source)&lt;BR /&gt;at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)&lt;BR /&gt;at org.apache.spark.sql.execution.WholeStageCodegenEvaluatorFactory$WholeStageCodegenPartitionEvaluator$$anon$1.hasNext(WholeStageCodegenEvaluatorFactory.scala:50)&lt;BR /&gt;at org.apache.spark.sql.execution.aggregate.SortAggregateExec.$anonfun$doExecute$1(SortAggregateExec.scala:67)&lt;BR /&gt;at org.apache.spark.sql.execution.aggregate.SortAggregateExec.$anonfun$doExecute$1$adapted(SortAggregateExec.scala:64)&lt;BR /&gt;at org.apache.spark.rdd.RDD.$anonfun$mapPartitionsWithIndexInternal$2(RDD.scala:932)&lt;BR /&gt;at org.apache.spark.rdd.RDD.$anonfun$mapPartitionsWithIndexInternal$2$adapted(RDD.scala:932)&lt;BR /&gt;at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:60)&lt;BR /&gt;at org.apache.spark.rdd.RDD.$anonfun$computeOrReadCheckpoint$1(RDD.scala:420)&lt;BR /&gt;at com.databricks.spark.util.ExecutorFrameProfiler$.record(ExecutorFrameProfiler.scala:110)&lt;BR /&gt;at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:417)&lt;BR /&gt;at org.apache.spark.rdd.RDD.iterator(RDD.scala:384)&lt;BR /&gt;at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:60)&lt;BR /&gt;at org.apache.spark.rdd.RDD.$anonfun$computeOrReadCheckpoint$1(RDD.scala:420)&lt;BR /&gt;at com.databricks.spark.util.ExecutorFrameProfiler$.record(ExecutorFrameProfiler.scala:110)&lt;BR /&gt;at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:417)&lt;BR /&gt;at org.apache.spark.rdd.RDD.iterator(RDD.scala:384)&lt;BR /&gt;at org.apache.spark.scheduler.ShuffleMapTask.$anonfun$runTask$3(ShuffleMapTask.scala:83)&lt;BR /&gt;at com.databricks.spark.util.ExecutorFrameProfiler$.record(ExecutorFrameProfiler.scala:110)&lt;BR /&gt;at org.apache.spark.scheduler.ShuffleMapTask.$anonfun$runTask$1(ShuffleMapTask.scala:82)&lt;BR /&gt;at com.databricks.spark.util.ExecutorFrameProfiler$.record(ExecutorFrameProfiler.scala:110)&lt;BR /&gt;at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:58)&lt;BR /&gt;at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:39)&lt;BR /&gt;at org.apache.spark.TaskContext.runTaskWithListeners(TaskContext.scala:227)&lt;BR /&gt;at org.apache.spark.scheduler.Task.doRunTask(Task.scala:204)&lt;BR /&gt;at org.apache.spark.scheduler.Task.$anonfun$run$5(Task.scala:166)&lt;BR /&gt;at com.databricks.unity.UCSEphemeralState$Handle.runWith(UCSEphemeralState.scala:51)&lt;BR /&gt;at com.databricks.unity.HandleImpl.runWith(UCSHandle.scala:104)&lt;BR /&gt;at com.databricks.unity.HandleImpl.$anonfun$runWithAndClose$1(UCSHandle.scala:109)&lt;BR /&gt;at scala.util.Using$.resource(Using.scala:269)&lt;BR /&gt;at com.databricks.unity.HandleImpl.runWithAndClose(UCSHandle.scala:108)&lt;BR /&gt;at org.apache.spark.scheduler.Task.$anonfun$run$1(Task.scala:160)&lt;BR /&gt;at com.databricks.spark.util.ExecutorFrameProfiler$.record(ExecutorFrameProfiler.scala:110)&lt;BR /&gt;at org.apache.spark.scheduler.Task.run(Task.scala:105)&lt;BR /&gt;at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$11(Executor.scala:1227)&lt;BR /&gt;at org.apache.spark.util.SparkErrorUtils.tryWithSafeFinally(SparkErrorUtils.scala:80)&lt;BR /&gt;at org.apache.spark.util.SparkErrorUtils.tryWithSafeFinally$(SparkErrorUtils.scala:77)&lt;BR /&gt;at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:112)&lt;BR /&gt;at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:1231)&lt;BR /&gt;at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)&lt;BR /&gt;at com.databricks.spark.util.ExecutorFrameProfiler$.record(ExecutorFrameProfiler.scala:110)&lt;BR /&gt;at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:1083)&lt;BR /&gt;at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)&lt;BR /&gt;at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)&lt;BR /&gt;at java.base/java.lang.Thread.run(Thread.java:840)&lt;/P&gt;&lt;P&gt;Driver stacktrace:&lt;BR /&gt;Photon ran out of memory while executing this query.&lt;BR /&gt;Photon failed to reserve 768.0 MiB for simdjson internal usage, in SimdJsonReader, in JsonFileScanNode(id=8883, output_schema=[string, string, string, string, string, string, string, string, string, string, string, string, string, string, string, string, string, string, string, string, string, string, string, string, string, ... 3 more]), in task.&lt;BR /&gt;Memory usage:&lt;BR /&gt;Total task memory (including non-Photon): 1152.0 MiB&lt;BR /&gt;task: allocated 262.1 MiB, tracked 1152.0 MiB, untracked allocated 0.0 B, peak 1152.0 MiB&lt;BR /&gt;BufferPool: allocated 6.1 MiB, tracked 128.0 MiB, untracked allocated 0.0 B, peak 128.0 MiB&lt;BR /&gt;DataWriter: allocated 0.0 B, tracked 0.0 B, untracked allocated 0.0 B, peak 0.0 B&lt;BR /&gt;Photon Protobuf Plan Arena: allocated 0.0 B, tracked 0.0 B, untracked allocated 0.0 B, peak 110.8 KiB&lt;BR /&gt;JsonFileScanNode(id=8883, output_schema=[string, string, string, string, string, string, string, string, string, string, string, string, string, string, string, string, string, string, string, string, string, string, string, string, string, ... 3 more]): allocated 256.0 MiB, tracked 1024.0 MiB, untracked allocated 0.0 B, peak 1024.0 MiB&lt;BR /&gt;JniReader: allocated 1984.0 B, tracked 1984.0 B, untracked allocated 0.0 B, peak 1984.0 B&lt;BR /&gt;SimdJsonReader: allocated 256.0 MiB, tracked 1024.0 MiB, untracked allocated 0.0 B, peak 1024.0 MiB&lt;BR /&gt;JSON buffer: allocated 256.0 MiB, tracked 256.0 MiB, untracked allocated 0.0 B, peak 256.0 MiB&lt;BR /&gt;simdjson internal usage: allocated 0.0 B, tracked 768.0 MiB, untracked allocated 0.0 B, peak 768.0 MiB&lt;BR /&gt;ProjectNode(id=8893, output_schema=[string, string, string, string, string, string, string, string, string, string, string, string, string, string, string, string, string, string, string, string, string, string, string, string, string, ... 3 more]): allocated 0.0 B, tracked 0.0 B, untracked allocated 0.0 B, peak 0.0 B&lt;BR /&gt;ProjectNode(id=8908, output_schema=[string, struct&amp;lt;string, string, string, string, string, string, string, string, string, string, string, string, string, string, string, string, string, string, string, string, string, string, string, string, string, ... 4 more&amp;gt;]): allocated 0.0 B, tracked 0.0 B, untracked allocated 0.0 B, peak 0.0 B&lt;BR /&gt;SortNode(id=8911, output_schema=[string, struct&amp;lt;string, string, string, string, string, string, string, string, string, string, string, string, string, string, string, string, string, string, string, string, string, string, string, string, string, ... 4 more&amp;gt;]): allocated 0.0 B, tracked 0.0 B, untracked allocated 0.0 B, peak 0.0 B&lt;BR /&gt;Sorter: allocated 0.0 B, tracked 0.0 B, untracked allocated 0.0 B, peak 0.0 B&lt;BR /&gt;spilled run buffers: allocated 0.0 B, tracked 0.0 B, untracked allocated 0.0 B, peak 0.0 B&lt;BR /&gt;output batch var len data: allocated 0.0 B, tracked 0.0 B, untracked allocated 0.0 B, peak 0.0 B&lt;BR /&gt;Memory consumers:&lt;BR /&gt;Acquired by com.databricks.photon.NativeMemoryConsumer@9cc6126: 1152.0 MiB&lt;/P&gt;</description>
      <pubDate>Tue, 23 Sep 2025 12:38:59 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/databricks-jobs-amp-pipelines-serverless-sparkoutofmemoryerror/m-p/132831#M49644</guid>
      <dc:creator>LarsMewa</dc:creator>
      <dc:date>2025-09-23T12:38:59Z</dc:date>
    </item>
    <item>
      <title>Re: Databricks Jobs &amp; Pipelines: Serverless SparkOutOfMemoryError while reading 500mb json file</title>
      <link>https://community.databricks.com/t5/data-engineering/databricks-jobs-amp-pipelines-serverless-sparkoutofmemoryerror/m-p/132847#M49649</link>
      <description>&lt;P&gt;Hi&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/111443"&gt;@LarsMewa&lt;/a&gt;&amp;nbsp;,&lt;/P&gt;&lt;P&gt;If you could try increasing your driver and executor memory and try defining the schema explicitly instead of inferring it.&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Tue, 23 Sep 2025 13:44:58 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/databricks-jobs-amp-pipelines-serverless-sparkoutofmemoryerror/m-p/132847#M49649</guid>
      <dc:creator>jayanta1</dc:creator>
      <dc:date>2025-09-23T13:44:58Z</dc:date>
    </item>
    <item>
      <title>Re: Databricks Jobs &amp; Pipelines: Serverless SparkOutOfMemoryError while reading 500mb json file</title>
      <link>https://community.databricks.com/t5/data-engineering/databricks-jobs-amp-pipelines-serverless-sparkoutofmemoryerror/m-p/132848#M49650</link>
      <description>&lt;P&gt;Hi&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/186209"&gt;@jayanta1&lt;/a&gt;&amp;nbsp;,&amp;nbsp;&lt;/P&gt;&lt;P&gt;can you guide me how to do that?&lt;/P&gt;</description>
      <pubDate>Tue, 23 Sep 2025 14:03:18 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/databricks-jobs-amp-pipelines-serverless-sparkoutofmemoryerror/m-p/132848#M49650</guid>
      <dc:creator>LarsMewa</dc:creator>
      <dc:date>2025-09-23T14:03:18Z</dc:date>
    </item>
    <item>
      <title>Re: Databricks Jobs &amp; Pipelines: Serverless SparkOutOfMemoryError while reading 500mb json file</title>
      <link>https://community.databricks.com/t5/data-engineering/databricks-jobs-amp-pipelines-serverless-sparkoutofmemoryerror/m-p/132854#M49652</link>
      <description>&lt;P&gt;Hi&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/111443"&gt;@LarsMewa&lt;/a&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;When using serverless, we will not be able to upgrade the size of the executor and the driver. Are you facing issue when processing the json and the csv file together?&amp;nbsp;&lt;/P&gt;
&lt;P&gt;The error indicates&amp;nbsp;&lt;STRONG data-start="286" data-end="320"&gt;serverless resource constraint&lt;/STRONG&gt;: serverless use limited off-heap memory for Photon. When you load &lt;EM data-start="429" data-end="435"&gt;both&lt;/EM&gt; CSV + JSON, Photon has less room left to allocate its big parsing buffer.&lt;/P&gt;
&lt;P&gt;Kindly let me know if you have any questions on this&lt;/P&gt;</description>
      <pubDate>Tue, 23 Sep 2025 14:45:59 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/databricks-jobs-amp-pipelines-serverless-sparkoutofmemoryerror/m-p/132854#M49652</guid>
      <dc:creator>Saritha_S</dc:creator>
      <dc:date>2025-09-23T14:45:59Z</dc:date>
    </item>
    <item>
      <title>Re: Databricks Jobs &amp; Pipelines: Serverless SparkOutOfMemoryError while reading 500mb json file</title>
      <link>https://community.databricks.com/t5/data-engineering/databricks-jobs-amp-pipelines-serverless-sparkoutofmemoryerror/m-p/132913#M49674</link>
      <description>&lt;P&gt;&lt;SPAN&gt;This fixed it:&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;As a quick workaround to address out-of-memory errors when processing large JSON files in Databricks serverless pipelines, we recommend disabling the Photon JSON Scan. The Photon engine is optimized for performance, but scanning large JSON files with it can use up to 7x the raw file size in memory.&lt;/SPAN&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;SPAN&gt;Try to disable Photon JSON Scan by adding this configuration to your pipeline or notebook:&lt;/SPAN&gt;&lt;/P&gt;&lt;DIV&gt;&lt;DIV class=""&gt;&lt;DIV class=""&gt;&lt;DIV class=""&gt;&lt;DIV class=""&gt;&amp;nbsp;&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;DIV class=""&gt;&lt;SPAN class=""&gt;set spark.databricks.photon.jsonScan.enabled&lt;/SPAN&gt;=&lt;SPAN class=""&gt;false&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV class=""&gt;&amp;nbsp;&lt;/DIV&gt;&lt;/DIV&gt;This forces the engine to use the standard Spark JSON reader, which uses far less memory. There may be a slight performance trade-off, but it is the most reliable solution for large JSON datasets in serverless workflows.&lt;/DIV&gt;</description>
      <pubDate>Wed, 24 Sep 2025 09:37:23 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/databricks-jobs-amp-pipelines-serverless-sparkoutofmemoryerror/m-p/132913#M49674</guid>
      <dc:creator>LarsMewa</dc:creator>
      <dc:date>2025-09-24T09:37:23Z</dc:date>
    </item>
  </channel>
</rss>

