On Databricks, we use the following code to flatten JSON in Python. The data is from a REST API:
```
df = spark.read.format("json").option("header", "true").option("multiline", "true").load(SourceFileFolder + sourcetable + "*.json")
df2 = df.select(psf.explode('value').alias('tmp')).select('tmp.*')
df2.write.format("delta").save(DeltaLakeFolder)
```
We don't know the schema's as they change so it is as generic as possible. However, as the json files grow above 2.8GB, I now see the following error:
```
Caused by: java.lang.IllegalArgumentException: Cannot grow BufferHolder by size 168 because the size after growing exceeds size limitation 2147483632
```
The json is like this:
```
{
"@odata.context": "RANDOMSTRING)",
"value": [
{
"COL1": null,
"COL2": "VAL2",
"COL3": "VAL3",
"COL4": "VAL4",
"COL5": "VAL5",
"COL6": "VAL6",
"COL8": "VAL7",
"COL9": null
},
{
"COL1": null,
"COL2": "VAL2",
"COL3": "VAL3",
"COL4": "VAL4",
"COL5": "VAL5",
"COL6": "VAL6",
"COL8": "VAL7",
"COL9": null
},
{
"COL1": null,
"COL2": "VAL2",
"COL3": "VAL3",
"COL4": "VAL4",
"COL5": "VAL5",
"COL6": "VAL6",
"COL8": "VAL7",
"COL9": null
}
]
}
```
How can I resolve this or work around this?
Thanks in advance!
Kind regards,
Dennis