cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Reprocess of old data stored in adls

Adigkar
New Contributor

Hi,

We have a requirement fir a scenario to reprocess old data using data factory pipeline.Here are the details

Storage in ADLSGEN2
Landing zone(where the data will be stored in the same format as we get from source),Data will be loaded from sql server to ADLS gen2 using
data pieline copy activity)

Bronze layer(Data from landing zone will be copied to bronze layer by converting it to delta tables,this is done using Azure Databricks notebooks
which runs pyspark code)

Silver and gold layer(Runs databricks notebook python code)

Now our requirment is,we get data daily through files,Landing zone will have archive of that data for 7 days where as bronze layer is truncate and load everyday.


We need to build a reprocess logic where in if we pass the date as parameter it should trigger the flow and take the old files wrt date we passed and start processing from the landing zone .Could you please help me with this

3 REPLIES 3

Hkesharwani
Contributor II

Hi, 
The approach will be somewhat similar to incremental approach.
In order to reprocess the old date from ADLS, the data should be identifiable [The data could be stored, in folder structure of YYYY->MM->DD> Fille, or file name should contain the date of the file.
This would help to identify the file, then the date could be passed using widget and the file could be identified based on the folder structure or file name.

Harshit Kesharwani
Self-taught Data Engineer | Seeking Remote Full-time Opportunities

Hkesharwani
Contributor II

@Kaniz_Fatma I just posted a possible solution for the above problem and it has been rejected community moderator without any explanation. 
This has happened to me twice in past as well.
Can you please help in this case. 

Harshit Kesharwani
Self-taught Data Engineer | Seeking Remote Full-time Opportunities

Hi @Hkesharwani, Thank you for your active participation in the Databricks community! We appreciate your contributions and insights.

Regarding your recent post, I apologize for any inconvenience caused by the rejection of your post.

I’ve reviewed your response and taken action to repost your replies, and you should now be able to repost them without any issues. Please feel free to share your solution again, and don’t hesitate to reach out if you encounter any further difficulties.

Thank you for your patience and understanding. We value your contributions and look forward to seeing more of your insights in the community!

Join 100K+ Data Experts: Register Now & Grow with Us!

Excited to expand your horizons with us? Click here to Register and begin your journey to success!

Already a member? Login and join your local regional user group! If there isn’t one near you, fill out this form and we’ll create one for you to join!