<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic How to maintain Primary Key Column in Databricks Delta Multi Cluster environment in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/how-to-maintain-primary-key-column-in-databricks-delta-multi/m-p/27828#M19676</link>
    <description>&lt;P&gt;&lt;/P&gt;
&lt;P&gt;I am trying to replicate the SQL DB like feature of maintaining the Primary Keys in Databrciks Delta approach where the data is being written to Blob Storage such as ADLS2 oe AWS S3.&lt;/P&gt;
&lt;P&gt;I want a Auto Incremented Primary key feature using Databricks Delta.&lt;/P&gt;
&lt;P&gt;&lt;B&gt;Existing approach -&lt;/B&gt; is using the latest row count and maintaining the Primary keys. However, this approach does not suit in parallel processing environment where Primary keys get duplicated data.&lt;/P&gt; 
&lt;P&gt;&lt;/P&gt;</description>
    <pubDate>Sun, 25 Aug 2019 11:47:41 GMT</pubDate>
    <dc:creator>AdityaDeshpande</dc:creator>
    <dc:date>2019-08-25T11:47:41Z</dc:date>
    <item>
      <title>How to maintain Primary Key Column in Databricks Delta Multi Cluster environment</title>
      <link>https://community.databricks.com/t5/data-engineering/how-to-maintain-primary-key-column-in-databricks-delta-multi/m-p/27828#M19676</link>
      <description>&lt;P&gt;&lt;/P&gt;
&lt;P&gt;I am trying to replicate the SQL DB like feature of maintaining the Primary Keys in Databrciks Delta approach where the data is being written to Blob Storage such as ADLS2 oe AWS S3.&lt;/P&gt;
&lt;P&gt;I want a Auto Incremented Primary key feature using Databricks Delta.&lt;/P&gt;
&lt;P&gt;&lt;B&gt;Existing approach -&lt;/B&gt; is using the latest row count and maintaining the Primary keys. However, this approach does not suit in parallel processing environment where Primary keys get duplicated data.&lt;/P&gt; 
&lt;P&gt;&lt;/P&gt;</description>
      <pubDate>Sun, 25 Aug 2019 11:47:41 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/how-to-maintain-primary-key-column-in-databricks-delta-multi/m-p/27828#M19676</guid>
      <dc:creator>AdityaDeshpande</dc:creator>
      <dc:date>2019-08-25T11:47:41Z</dc:date>
    </item>
    <item>
      <title>Re: How to maintain Primary Key Column in Databricks Delta Multi Cluster environment</title>
      <link>https://community.databricks.com/t5/data-engineering/how-to-maintain-primary-key-column-in-databricks-delta-multi/m-p/27829#M19677</link>
      <description>&lt;P&gt;Hi @Aditya Deshpande​&amp;nbsp;&lt;/P&gt;&lt;P&gt;There is no locking mechanism of PK in Delta. You can use row_number() function on the df and save using delta and do a distinct() before the write.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;</description>
      <pubDate>Mon, 26 Aug 2019 14:13:23 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/how-to-maintain-primary-key-column-in-databricks-delta-multi/m-p/27829#M19677</guid>
      <dc:creator>girivaratharaja</dc:creator>
      <dc:date>2019-08-26T14:13:23Z</dc:date>
    </item>
    <item>
      <title>Re: How to maintain Primary Key Column in Databricks Delta Multi Cluster environment</title>
      <link>https://community.databricks.com/t5/data-engineering/how-to-maintain-primary-key-column-in-databricks-delta-multi/m-p/27830#M19678</link>
      <description>&lt;P&gt;&lt;/P&gt;
&lt;P&gt;This is the existing approach we're using. But this has limitation when run in Multi Cluster environment which is writing the data into same Destination. &lt;/P&gt; 
&lt;P&gt;&lt;/P&gt;</description>
      <pubDate>Tue, 27 Aug 2019 07:09:49 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/how-to-maintain-primary-key-column-in-databricks-delta-multi/m-p/27830#M19678</guid>
      <dc:creator>AdityaDeshpande</dc:creator>
      <dc:date>2019-08-27T07:09:49Z</dc:date>
    </item>
  </channel>
</rss>

