Shalabh007
Honored Contributor

@Hare Krishnan​ the issues highlighted can easily be handled using the .option("mergeSchema", "true") at the time of reading all the files.

Sample code:

spark.read.option("mergeSchema", "true").json(<file paths>, multiLine=True)

The only scenario this will not be able to handle if the type inside your nested column is not same.

Sample file 1:

{
	"name": "test",
	"check": [
		{
			"id": "1",
		},
	]
}

Sample file 2:

{
	"name": "test",
	"check": [
		{
			"id": 1,
		},
	]
}

for above 2 files mergeSchema option will fail because "id" column inside check has 2 different types of values: string (in file 1) and int (in file 2).

To handle this scenario you will have to write some custom function.