cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
cancel
Showing results for 
Search instead for 
Did you mean: 

Recommendations for loading table from two different folder paths using Autoloader and DLT

bblakey
New Contributor II

I have a new (bronze) table that I want to write to - the initial table load (refresh) csv file is placed in folder a, the incremental changes (inserts/updates/deletes) csv files are placed in folder b. I've written a notebook that can load one OR the other, but not both.

My intention is that I will load the table initially (folder a), then consume data changes (from folder b) as they arrive and apply_changes to that table I've loaded from folder a. So one target table with two source folders.

What is the recommendation for approaching this, what would be a good ingestion pattern for something like this?

3 REPLIES 3

Kaniz
Community Manager
Community Manager

Hi @Bill Blakey​, Does this S.O thread helps you to find your solution?

bblakey
New Contributor II

Kaniz, thank you for the response. Perhaps this can help, need to do more reading on ThreadPoolExecutor for Spark. The other "minor" issue I did not mention is that the files in each folder have a few mutually-exclusive metadata columns that I either exclude/omit or synthesize by including with a "withColumn". The scenario I'm trying to accommodate is the D365 Export to Data Lake which seems like it should be straight-forward but is not really.

Kaniz
Community Manager
Community Manager

Hi @Bill Blakey​, Thank you for your response.

Welcome to Databricks Community: Lets learn, network and celebrate together

Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections. 

Click here to register and join today! 

Engage in exciting technical discussions, join a group with your peers and meet our Featured Members.