Options
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
11-29-2022 02:48 AM
@Hare Krishnan the issues highlighted can easily be handled using the .option("mergeSchema", "true") at the time of reading all the files.
Sample code:
spark.read.option("mergeSchema", "true").json(<file paths>, multiLine=True)The only scenario this will not be able to handle if the type inside your nested column is not same.
Sample file 1:
{
"name": "test",
"check": [
{
"id": "1",
},
]
}Sample file 2:
{
"name": "test",
"check": [
{
"id": 1,
},
]
}for above 2 files mergeSchema option will fail because "id" column inside check has 2 different types of values: string (in file 1) and int (in file 2).
To handle this scenario you will have to write some custom function.