<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Reprocess of old data stored in adls in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/reprocess-of-old-data-stored-in-adls/m-p/71621#M34355</link>
    <description>&lt;P&gt;Hi,&amp;nbsp;&lt;BR /&gt;The approach will be somewhat similar to incremental approach.&lt;BR /&gt;In order to reprocess the old date from ADLS, the data should be identifiable [The data could be stored, in folder structure of YYYY-&amp;gt;MM-&amp;gt;DD&amp;gt; Fille, or file name should contain the date of the file.&lt;BR /&gt;This would help to identify the file, then the date could be passed using widget and the file could be identified based on the folder structure or file name.&lt;/P&gt;</description>
    <pubDate>Tue, 04 Jun 2024 14:17:40 GMT</pubDate>
    <dc:creator>Hkesharwani</dc:creator>
    <dc:date>2024-06-04T14:17:40Z</dc:date>
    <item>
      <title>Reprocess of old data stored in adls</title>
      <link>https://community.databricks.com/t5/data-engineering/reprocess-of-old-data-stored-in-adls/m-p/71574#M34349</link>
      <description>&lt;P&gt;Hi,&lt;/P&gt;&lt;P&gt;We have a requirement fir a scenario to reprocess old data using data factory pipeline.Here are the details&lt;/P&gt;&lt;P&gt;Storage in ADLSGEN2&lt;BR /&gt;Landing zone(where the data will be stored in the same format as we get from source),Data will be loaded from sql server to ADLS gen2 using&lt;BR /&gt;data pieline copy activity)&lt;/P&gt;&lt;P&gt;Bronze layer(Data from landing zone will be copied to bronze layer by converting it to delta tables,this is done using Azure Databricks notebooks&lt;BR /&gt;which runs pyspark code)&lt;/P&gt;&lt;P&gt;Silver and gold layer(Runs databricks notebook python code)&lt;/P&gt;&lt;P&gt;Now our requirment is,we get data daily through files,Landing zone will have archive of that data for 7 days where as bronze layer is truncate and load everyday.&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;We need to build a reprocess logic where in if we pass the date as parameter it should trigger the flow and take the old files wrt date we passed and start processing from the landing zone .Could you please help me with this&lt;/P&gt;</description>
      <pubDate>Tue, 04 Jun 2024 09:40:25 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/reprocess-of-old-data-stored-in-adls/m-p/71574#M34349</guid>
      <dc:creator>Adigkar</dc:creator>
      <dc:date>2024-06-04T09:40:25Z</dc:date>
    </item>
    <item>
      <title>Re: Reprocess of old data stored in adls</title>
      <link>https://community.databricks.com/t5/data-engineering/reprocess-of-old-data-stored-in-adls/m-p/71621#M34355</link>
      <description>&lt;P&gt;Hi,&amp;nbsp;&lt;BR /&gt;The approach will be somewhat similar to incremental approach.&lt;BR /&gt;In order to reprocess the old date from ADLS, the data should be identifiable [The data could be stored, in folder structure of YYYY-&amp;gt;MM-&amp;gt;DD&amp;gt; Fille, or file name should contain the date of the file.&lt;BR /&gt;This would help to identify the file, then the date could be passed using widget and the file could be identified based on the folder structure or file name.&lt;/P&gt;</description>
      <pubDate>Tue, 04 Jun 2024 14:17:40 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/reprocess-of-old-data-stored-in-adls/m-p/71621#M34355</guid>
      <dc:creator>Hkesharwani</dc:creator>
      <dc:date>2024-06-04T14:17:40Z</dc:date>
    </item>
    <item>
      <title>Re: Reprocess of old data stored in adls</title>
      <link>https://community.databricks.com/t5/data-engineering/reprocess-of-old-data-stored-in-adls/m-p/71622#M34356</link>
      <description>&lt;P&gt;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/9"&gt;@Retired_mod&lt;/a&gt;&amp;nbsp;I just posted a possible solution for the above problem and it has been rejected community moderator without any explanation.&amp;nbsp;&lt;BR /&gt;This has happened to me twice in past as well.&lt;BR /&gt;Can you please help in this case.&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Tue, 04 Jun 2024 14:29:12 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/reprocess-of-old-data-stored-in-adls/m-p/71622#M34356</guid>
      <dc:creator>Hkesharwani</dc:creator>
      <dc:date>2024-06-04T14:29:12Z</dc:date>
    </item>
  </channel>
</rss>

