<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: DLT overwrite part of the table in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/dlt-overwrite-part-of-the-table/m-p/48935#M28427</link>
    <description>&lt;P&gt;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/84270"&gt;@erigaud&lt;/a&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Using jobs/workflows would be the right choice for this.&lt;/P&gt;</description>
    <pubDate>Wed, 11 Oct 2023 09:50:41 GMT</pubDate>
    <dc:creator>Tharun-Kumar</dc:creator>
    <dc:date>2023-10-11T09:50:41Z</dc:date>
    <item>
      <title>DLT overwrite part of the table</title>
      <link>https://community.databricks.com/t5/data-engineering/dlt-overwrite-part-of-the-table/m-p/48338#M28281</link>
      <description>&lt;P&gt;Hello !&lt;/P&gt;&lt;P&gt;We're currently building a pipeline of file ingestion using a Delta Live Tables pipeline and autoloader.&amp;nbsp;&lt;/P&gt;&lt;P&gt;The bronze tables are pretty much the following schema :&amp;nbsp;&lt;/P&gt;&lt;P&gt;file_name | file_upload_date | colA | colB&amp;nbsp;&lt;/P&gt;&lt;P&gt;(Well, there are actually 250+ columns but you get the idea)&lt;/P&gt;&lt;P&gt;The bronze table is append only, with possibly some duplicates because some files can be uploaded several times with corrections, but they will have the same name. The logic I'm trying to implement table is the following:&amp;nbsp;&lt;/P&gt;&lt;P&gt;- A file is loaded in bronze, lets say 500 rows with file_name = file_name_A.csv and the corresponding upload_date (that part is fine, just standard auto-loader)&lt;/P&gt;&lt;P&gt;- In silver we already had some rows (lets say 1000) for that file_name, but an older upload_date. In that case we want to replace all the 1000 rows by the newer 500 rows.&amp;nbsp;&lt;/P&gt;&lt;P&gt;How would someone go about doing something like this using Delta Live Table ?&amp;nbsp;&lt;/P&gt;&lt;P&gt;Thank you !&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Thu, 05 Oct 2023 08:19:20 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/dlt-overwrite-part-of-the-table/m-p/48338#M28281</guid>
      <dc:creator>erigaud</dc:creator>
      <dc:date>2023-10-05T08:19:20Z</dc:date>
    </item>
    <item>
      <title>Re: DLT overwrite part of the table</title>
      <link>https://community.databricks.com/t5/data-engineering/dlt-overwrite-part-of-the-table/m-p/48447#M28308</link>
      <description>&lt;P&gt;Hello&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/9"&gt;@Retired_mod&lt;/a&gt;. I am not sure MERGE INTO is the right solution to my problem, as I do not have a unique key in that situation. So I will have many rows with the same file_name that will match many rows in my table.&amp;nbsp;&lt;/P&gt;&lt;P&gt;Do you have a better solution ?&lt;/P&gt;</description>
      <pubDate>Thu, 05 Oct 2023 19:16:16 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/dlt-overwrite-part-of-the-table/m-p/48447#M28308</guid>
      <dc:creator>erigaud</dc:creator>
      <dc:date>2023-10-05T19:16:16Z</dc:date>
    </item>
    <item>
      <title>Re: DLT overwrite part of the table</title>
      <link>https://community.databricks.com/t5/data-engineering/dlt-overwrite-part-of-the-table/m-p/48530#M28322</link>
      <description>&lt;P&gt;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/84270"&gt;@erigaud&lt;/a&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;You could get the distinct file name from the new set of records and remove all it's entries from your silver table. We could then have them appended to the silver table.&lt;/P&gt;</description>
      <pubDate>Fri, 06 Oct 2023 04:36:29 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/dlt-overwrite-part-of-the-table/m-p/48530#M28322</guid>
      <dc:creator>Tharun-Kumar</dc:creator>
      <dc:date>2023-10-06T04:36:29Z</dc:date>
    </item>
    <item>
      <title>Re: DLT overwrite part of the table</title>
      <link>https://community.databricks.com/t5/data-engineering/dlt-overwrite-part-of-the-table/m-p/48542#M28327</link>
      <description>&lt;P&gt;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/39403"&gt;@Tharun-Kumar&lt;/a&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;That's the solution I was thinking of, but is there a clean way to do that using DLT or should I just use a regular notebook task and simple Delta Tables ?&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Fri, 06 Oct 2023 05:57:51 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/dlt-overwrite-part-of-the-table/m-p/48542#M28327</guid>
      <dc:creator>erigaud</dc:creator>
      <dc:date>2023-10-06T05:57:51Z</dc:date>
    </item>
    <item>
      <title>Re: DLT overwrite part of the table</title>
      <link>https://community.databricks.com/t5/data-engineering/dlt-overwrite-part-of-the-table/m-p/48935#M28427</link>
      <description>&lt;P&gt;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/84270"&gt;@erigaud&lt;/a&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Using jobs/workflows would be the right choice for this.&lt;/P&gt;</description>
      <pubDate>Wed, 11 Oct 2023 09:50:41 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/dlt-overwrite-part-of-the-table/m-p/48935#M28427</guid>
      <dc:creator>Tharun-Kumar</dc:creator>
      <dc:date>2023-10-11T09:50:41Z</dc:date>
    </item>
  </channel>
</rss>

