Databricks Community

D3nnisd · ‎10-20-2021

On Databricks, we use the following code to flatten JSON in Python. The data is from a REST API:

```

df = spark.read.format("json").option("header", "true").option("multiline", "true").load(SourceFileFolder + sourcetable + "*.json")

df2 = df.select(psf.explode('value').alias('tmp')).select('tmp.*')

df2.write.format("delta").save(DeltaLakeFolder)

```

We don't know the schema's as they change so it is as generic as possible. However, as the json files grow above 2.8GB, I now see the following error:

```

Caused by: java.lang.IllegalArgumentException: Cannot grow BufferHolder by size 168 because the size after growing exceeds size limitation 2147483632

```

The json is like this:

```

{

"@odata.context": "RANDOMSTRING)",

"value": [

{

"COL1": null,

"COL2": "VAL2",

"COL3": "VAL3",

"COL4": "VAL4",

"COL5": "VAL5",

"COL6": "VAL6",

"COL8": "VAL7",

"COL9": null

},

{

"COL1": null,

"COL2": "VAL2",

"COL3": "VAL3",

"COL4": "VAL4",

"COL5": "VAL5",

"COL6": "VAL6",

"COL8": "VAL7",

"COL9": null

},

{

"COL1": null,

"COL2": "VAL2",

"COL3": "VAL3",

"COL4": "VAL4",

"COL5": "VAL5",

"COL6": "VAL6",

"COL8": "VAL7",

"COL9": null

}

]

}

```

How can I resolve this or work around this?

Thanks in advance!

Kind regards,

Dennis

D3nnisd · ‎10-21-2021

Nevermind. The JSON was somehow corrupted. I re-extracted and it worked out of the box 🙂

Databricks Community

BufferHolder Exceeded on Json flattening

Photos

Join Us as a Local Community Builder!

Announcing the APJ Databricks Smart Business Insights Challenge: Empowering Data-Driven Decision Mak

🚀 Monthly Databricks Get Started Days – Accelerate Your Learning Journey! 🚀

Business Intelligence in the Era of AI

Virtual Learning Festival: 9 April - 30 April

Data + AI Summit 2025 — registration now open!