cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

databricks autoloader getting stuck in flattening json files for different scenarios similar in nature.

SudiptaBiswas
New Contributor III

I have a databricks autoloader notebook that reads json files from an input location and writes the flattened version of json files to an output location. However, the notebook is behaving differently for two different but similar scenarios as described below.

Any help is appreciated.

Autoloader functionality - flattening of JSON files

Scenario 1:

Step a) At start: 'Input location 1' has 3 non-zero size json files (File1.json, File2.json, File3.json) and 24 zero size json files. 'Input location 1' has a total of 27 json files.

Autoloader notebook is started. Autoloader notebook reads the 3 non-zero size json files (File1.json, File2.json, File3.json). Autoloader notebook properly flattens the 3 non-zero size json files (File1.json, File2.json, File3.json) and writes to 'Output location 1'.

Step b) after this 3 more non-zero size json files (File4.json, File5.json, File6.json) are added to 'Input location 1' while the Autoloader notebook is running. 'Input location 1' now has a total of 30 json files. Autoloader notebook reads the 3 additional non-zero size json files (File4.json, File5.json, File6.json), properly flattens the 3 additional non-zero size json files (File4.json, File5.json, File6.json) and writes to 'Output location 1'.

'Output location 1' contains records pertaining to 6 non-zero size json files (File1.json, File2.json, File3.json and File4.json, File5.json, File6.json)

No problem with Scenario 1. The problem arises with step b of Scenario 2 (given below). Scenario 1 and Scenario 2 are similar.

Scenario 2:

Step a) At start: 'Input location 2' has 65 non-zero size json files (File1.json, File2.json, .........,File65.json) and 24 zero size json files. 'Input location 2' has a total of 89 json files.

Autoloader notebook is started. Autoloader notebook reads the 65 non-zero size json files (File1.json, File2.json, .........,File65.json). Autoloader notebook properly flattens the 65 non-zero size json files (File1.json, File2.json, .........,File65.json) and writes to 'Output location 2'.

Step b) after this 3 more non-zero size json files (File4.json, File5.json, File6.json) are added to 'Input location 2' while the Autoloader notebook is running. 'Input location 2' now has a total of 92 json files. Autoloader notebook reads the 3 additional non-zero size json files (File4.json, File5.json, File6.json) but doesn't write the flattened output of the 3 additional non-zero size json files (File4.json, File5.json, File6.json) to 'Output location 2'.

'Output location 2' contains records pertaining to 65 non-zero size json files (File1.json, File2.json, .........,File65.json) but doesn't contain records for 3 additional non-zero size json files (File4.json, File5.json, File6.json) added in step b of 'Scenario2'

Question:

Can anyone please provide any direction to solve this issue - why the Autoloader notebook cannot flatten and write the the 3 additional non-zero size json files (File4.json, File5.json, File6.json) in step b of 'Scenario2' but the same Autoloader notebook can flatten and write the the 3 additional non-zero size json files (File4.json, File5.json, File6.json) in step b of 'Scenario1'

Any help is appreciated.

Note:All non-zero json files are less than 25KB. autoloader notebook reads/senses input files using 'fileNotification' method [option("cloudFiles.useNotifications","true")].

Thanks,

Sudipta.

3 REPLIES 3

SudiptaBiswas
New Contributor III

Can anyone please provide any suggestions ?

#[Azure databricks]​ , #[Databricks autoloader]​ , #Autoloader​ 

jose_gonzalez
Moderator
Moderator

Could you provide a code snippet? also do you see any error logs in the driver logs?

Thanks for your reply. I am sorry I cannot provide the code snippet.

If there had been any error (driver or other error) then the autoloader notebook would have got stopped which didn't happen in this case. Please correct me if I am wrong.

The autoloader notebook continued running in step b of 'Scenario2'  since it was started (in step a of 'Scenario2') 

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group