I have a databricks autoloader notebook that reads json files from an input location and writes the flattened version of json files to an output location. However, the notebook is behaving differently for two different but similar scenarios as described below.
Any help is appreciated.
Autoloader functionality - flattening of JSON files
Scenario 1:
Step a) At start: 'Input location 1' has 3 non-zero size json files (File1.json, File2.json, File3.json) and 24 zero size json files. 'Input location 1' has a total of 27 json files.
Autoloader notebook is started. Autoloader notebook reads the 3 non-zero size json files (File1.json, File2.json, File3.json). Autoloader notebook properly flattens the 3 non-zero size json files (File1.json, File2.json, File3.json) and writes to 'Output location 1'.
Step b) after this 3 more non-zero size json files (File4.json, File5.json, File6.json) are added to 'Input location 1' while the Autoloader notebook is running. 'Input location 1' now has a total of 30 json files. Autoloader notebook reads the 3 additional non-zero size json files (File4.json, File5.json, File6.json), properly flattens the 3 additional non-zero size json files (File4.json, File5.json, File6.json) and writes to 'Output location 1'.
'Output location 1' contains records pertaining to 6 non-zero size json files (File1.json, File2.json, File3.json and File4.json, File5.json, File6.json)
No problem with Scenario 1. The problem arises with step b of Scenario 2 (given below). Scenario 1 and Scenario 2 are similar.
Scenario 2:
Step a) At start: 'Input location 2' has 65 non-zero size json files (File1.json, File2.json, .........,File65.json) and 24 zero size json files. 'Input location 2' has a total of 89 json files.
Autoloader notebook is started. Autoloader notebook reads the 65 non-zero size json files (File1.json, File2.json, .........,File65.json). Autoloader notebook properly flattens the 65 non-zero size json files (File1.json, File2.json, .........,File65.json) and writes to 'Output location 2'.
Step b) after this 3 more non-zero size json files (File4.json, File5.json, File6.json) are added to 'Input location 2' while the Autoloader notebook is running. 'Input location 2' now has a total of 92 json files. Autoloader notebook reads the 3 additional non-zero size json files (File4.json, File5.json, File6.json) but doesn't write the flattened output of the 3 additional non-zero size json files (File4.json, File5.json, File6.json) to 'Output location 2'.
'Output location 2' contains records pertaining to 65 non-zero size json files (File1.json, File2.json, .........,File65.json) but doesn't contain records for 3 additional non-zero size json files (File4.json, File5.json, File6.json) added in step b of 'Scenario2'
Question:
Can anyone please provide any direction to solve this issue - why the Autoloader notebook cannot flatten and write the the 3 additional non-zero size json files (File4.json, File5.json, File6.json) in step b of 'Scenario2' but the same Autoloader notebook can flatten and write the the 3 additional non-zero size json files (File4.json, File5.json, File6.json) in step b of 'Scenario1'
Any help is appreciated.
Note:All non-zero json files are less than 25KB. autoloader notebook reads/senses input files using 'fileNotification' method [option("cloudFiles.useNotifications","true")].
Thanks,
Sudipta.