I have a situation where source files in .json.gz sometimes arrive with invalid syntax containing multiple roots separated by empty braces []. How can I detect this and thrown an exception? Currently the code runs and picks up only record set 1, and skips all the others without throwing any kind of exception.
Example of a bad source file (sorry for bad formatting, kept getting an invalid HTML error):
[
//...record set 1
"key": "value",
"key": "value",
"key": [
"value"
],
"key": "value",
"key": "value",
"key": "value"
},
][][
//... record set 2
][][][][
// ... record set 3
]
file read code:
df = spark.read \
.format("json") \
.load("<file path>.json.gz", multiLine=True)
print(df.count())