cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Reprocess of old data stored in adls

Adigkar
New Contributor

Hi,

We have a requirement fir a scenario to reprocess old data using data factory pipeline.Here are the details

Storage in ADLSGEN2
Landing zone(where the data will be stored in the same format as we get from source),Data will be loaded from sql server to ADLS gen2 using
data pieline copy activity)

Bronze layer(Data from landing zone will be copied to bronze layer by converting it to delta tables,this is done using Azure Databricks notebooks
which runs pyspark code)

Silver and gold layer(Runs databricks notebook python code)

Now our requirment is,we get data daily through files,Landing zone will have archive of that data for 7 days where as bronze layer is truncate and load everyday.


We need to build a reprocess logic where in if we pass the date as parameter it should trigger the flow and take the old files wrt date we passed and start processing from the landing zone .Could you please help me with this

2 REPLIES 2

Hkesharwani
Contributor II

Hi, 
The approach will be somewhat similar to incremental approach.
In order to reprocess the old date from ADLS, the data should be identifiable [The data could be stored, in folder structure of YYYY->MM->DD> Fille, or file name should contain the date of the file.
This would help to identify the file, then the date could be passed using widget and the file could be identified based on the folder structure or file name.

Harshit Kesharwani
Data engineer at Rsystema

Hkesharwani
Contributor II

@Retired_mod I just posted a possible solution for the above problem and it has been rejected community moderator without any explanation. 
This has happened to me twice in past as well.
Can you please help in this case. 

Harshit Kesharwani
Data engineer at Rsystema

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group