Databricks Community

SudiptaBiswas · ‎12-21-2022

I have a databricks autoloader notebook that reads json files from an input location and writes the flattened version of json files to an output location. However, the notebook is behaving differently for two different but similar scenarios as described below.

Any help is appreciated.

Autoloader functionality - flattening of JSON files

Scenario 1:

Step a) At start: 'Input location 1' has 3 non-zero size json files (File1.json, File2.json, File3.json) and 24 zero size json files. 'Input location 1' has a total of 27 json files.

Autoloader notebook is started. Autoloader notebook reads the 3 non-zero size json files (File1.json, File2.json, File3.json). Autoloader notebook properly flattens the 3 non-zero size json files (File1.json, File2.json, File3.json) and writes to 'Output location 1'.

Step b) after this 3 more non-zero size json files (File4.json, File5.json, File6.json) are added to 'Input location 1' while the Autoloader notebook is running. 'Input location 1' now has a total of 30 json files. Autoloader notebook reads the 3 additional non-zero size json files (File4.json, File5.json, File6.json), properly flattens the 3 additional non-zero size json files (File4.json, File5.json, File6.json) and writes to 'Output location 1'.

'Output location 1' contains records pertaining to 6 non-zero size json files (File1.json, File2.json, File3.json and File4.json, File5.json, File6.json)

No problem with Scenario 1. The problem arises with step b of Scenario 2 (given below). Scenario 1 and Scenario 2 are similar.

Scenario 2:

Step a) At start: 'Input location 2' has 65 non-zero size json files (File1.json, File2.json, .........,File65.json) and 24 zero size json files. 'Input location 2' has a total of 89 json files.

Autoloader notebook is started. Autoloader notebook reads the 65 non-zero size json files (File1.json, File2.json, .........,File65.json). Autoloader notebook properly flattens the 65 non-zero size json files (File1.json, File2.json, .........,File65.json) and writes to 'Output location 2'.

Step b) after this 3 more non-zero size json files (File4.json, File5.json, File6.json) are added to 'Input location 2' while the Autoloader notebook is running. 'Input location 2' now has a total of 92 json files. Autoloader notebook reads the 3 additional non-zero size json files (File4.json, File5.json, File6.json) but doesn't write the flattened output of the 3 additional non-zero size json files (File4.json, File5.json, File6.json) to 'Output location 2'.

'Output location 2' contains records pertaining to 65 non-zero size json files (File1.json, File2.json, .........,File65.json) but doesn't contain records for 3 additional non-zero size json files (File4.json, File5.json, File6.json) added in step b of 'Scenario2'

Question:

Can anyone please provide any direction to solve this issue - why the Autoloader notebook cannot flatten and write the the 3 additional non-zero size json files (File4.json, File5.json, File6.json) in step b of 'Scenario2' but the same Autoloader notebook can flatten and write the the 3 additional non-zero size json files (File4.json, File5.json, File6.json) in step b of 'Scenario1'

Any help is appreciated.

Note:All non-zero json files are less than 25KB. autoloader notebook reads/senses input files using 'fileNotification' method [option("cloudFiles.useNotifications","true")].

Thanks,

Sudipta.

SudiptaBiswas · ‎12-22-2022

Can anyone please provide any suggestions ?

#[Azure databricks] , #[Databricks autoloader] , #Autoloader

jose_gonzalez · ‎12-27-2022

Could you provide a code snippet? also do you see any error logs in the driver logs?

SudiptaBiswas · ‎12-27-2022

Thanks for your reply. I am sorry I cannot provide the code snippet.

If there had been any error (driver or other error) then the autoloader notebook would have got stopped which didn't happen in this case. Please correct me if I am wrong.

The autoloader notebook continued running in step b of 'Scenario2' since it was started (in step a of 'Scenario2')

Databricks Community

databricks autoloader getting stuck in flattening json files for different scenarios similar in nature.

Join Us as a Local Community Builder!

🌟 Community Pulse: Your Weekly Roundup! October 31 – November 06, 2025

Free Edition Hackathon

🚀 Announcing the Databricks Data Intelligence Platform Cheat Sheet

Zerobus Ingest in Action: How to Stream Event Data Directly into Your Lakehouse

Find Sensitive Data at Scale with Data Classification in Unity Catalog