<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Primary key constraint not working in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/primary-key-constraint-not-working/m-p/153916#M54029</link>
    <description>&lt;P&gt;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/184264"&gt;@AanchalSoni&lt;/a&gt;&amp;nbsp; In Databricks, there's no concept of Primary/Foreign Key as Delta Tables are not a relational entity like the traditional RDMS -- just informational only. You may apply NOT NULL or set EXPECTATIONS in Lakeflow to control the execution and data quality.&lt;/P&gt;</description>
    <pubDate>Thu, 09 Apr 2026 13:02:51 GMT</pubDate>
    <dc:creator>Sumit_7</dc:creator>
    <dc:date>2026-04-09T13:02:51Z</dc:date>
    <item>
      <title>Primary key constraint not working</title>
      <link>https://community.databricks.com/t5/data-engineering/primary-key-constraint-not-working/m-p/153896#M54026</link>
      <description>&lt;P&gt;I've created a Lakeflow job to run 5 notebook tasks, one for each silver table- Customers, Accounts, Transactions, Loans and Branches.&lt;/P&gt;&lt;P&gt;In Customers notebook, after writing the data to delta table using auto loader, I'm applying the non null and primary key constraint on customer_id:&lt;/P&gt;&lt;DIV&gt;&lt;DIV&gt;&amp;nbsp;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;spark&lt;/SPAN&gt;&lt;SPAN&gt;.sql&lt;/SPAN&gt;&lt;SPAN&gt;(&lt;/SPAN&gt;&lt;SPAN&gt;"&lt;/SPAN&gt;&lt;SPAN&gt;ALTER&lt;/SPAN&gt; &lt;SPAN&gt;TABLE&lt;/SPAN&gt; &lt;SPAN&gt;capstone_project&lt;/SPAN&gt;&lt;SPAN&gt;.&lt;/SPAN&gt;&lt;SPAN&gt;silver&lt;/SPAN&gt;&lt;SPAN&gt;.customers_silver &lt;/SPAN&gt;&lt;SPAN&gt;ALTER&lt;/SPAN&gt; &lt;SPAN&gt;COLUMN&lt;/SPAN&gt;&lt;SPAN&gt; customer_id &lt;/SPAN&gt;&lt;SPAN&gt;SET&lt;/SPAN&gt; &lt;SPAN&gt;NOT NULL&lt;/SPAN&gt;&lt;SPAN&gt;"&lt;/SPAN&gt;&lt;SPAN&gt;)&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;spark&lt;/SPAN&gt;&lt;SPAN&gt;.sql&lt;/SPAN&gt;&lt;SPAN&gt;(&lt;/SPAN&gt;&lt;SPAN&gt;"&lt;/SPAN&gt;&lt;SPAN&gt;ALTER&lt;/SPAN&gt; &lt;SPAN&gt;TABLE&lt;/SPAN&gt; &lt;SPAN&gt;capstone_project&lt;/SPAN&gt;&lt;SPAN&gt;.&lt;/SPAN&gt;&lt;SPAN&gt;silver&lt;/SPAN&gt;&lt;SPAN&gt;.customers_silver &lt;/SPAN&gt;&lt;SPAN&gt;ADD&lt;/SPAN&gt; &lt;SPAN&gt;CONSTRAINT&lt;/SPAN&gt;&lt;SPAN&gt; pk_customer_id &lt;/SPAN&gt;&lt;SPAN&gt;PRIMARY KEY&lt;/SPAN&gt;&lt;SPAN&gt; (customer_id)&lt;/SPAN&gt;&lt;SPAN&gt;"&lt;/SPAN&gt;&lt;SPAN&gt;)&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&amp;nbsp;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;Rest other tables have a foreign key constraint in a similar fashion on customer_id. And their relevant notebooks are dependent on customers notebook.&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&amp;nbsp;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;After the successful execution of customers notebook, the rest others throw an error that there is no primary key constraint in customers table. When I check the customers table, yes! there exists no primary key. &lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&amp;nbsp;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;But, I have a code that has got executed and returned a dataframe. Then where did the primary key go? I'm unable to identify the root cause. Please assist&lt;/SPAN&gt;&lt;/DIV&gt;&lt;/DIV&gt;</description>
      <pubDate>Thu, 09 Apr 2026 12:12:02 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/primary-key-constraint-not-working/m-p/153896#M54026</guid>
      <dc:creator>AanchalSoni</dc:creator>
      <dc:date>2026-04-09T12:12:02Z</dc:date>
    </item>
    <item>
      <title>Re: Primary key constraint not working</title>
      <link>https://community.databricks.com/t5/data-engineering/primary-key-constraint-not-working/m-p/153916#M54029</link>
      <description>&lt;P&gt;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/184264"&gt;@AanchalSoni&lt;/a&gt;&amp;nbsp; In Databricks, there's no concept of Primary/Foreign Key as Delta Tables are not a relational entity like the traditional RDMS -- just informational only. You may apply NOT NULL or set EXPECTATIONS in Lakeflow to control the execution and data quality.&lt;/P&gt;</description>
      <pubDate>Thu, 09 Apr 2026 13:02:51 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/primary-key-constraint-not-working/m-p/153916#M54029</guid>
      <dc:creator>Sumit_7</dc:creator>
      <dc:date>2026-04-09T13:02:51Z</dc:date>
    </item>
    <item>
      <title>Re: Primary key constraint not working</title>
      <link>https://community.databricks.com/t5/data-engineering/primary-key-constraint-not-working/m-p/153933#M54030</link>
      <description>&lt;P&gt;Hi&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/184264"&gt;@AanchalSoni&lt;/a&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Primary &amp;amp; Foreign Keys in Unity Catalog are Informational Only. Databricks does not enforce uniqueness during writes to avoid the massive performance overhead. You are responsible for maintaining data integrity in the pipeline logic. Table definition is just the blueprint for documentation and optimization. You can use below&lt;/P&gt;&lt;UL&gt;&lt;LI&gt;&lt;STRONG&gt;MERGE:&lt;/STRONG&gt;&amp;nbsp;Instead of INSERT, use MERGE to upsert records as its the standard way to prevent duplicates in Delta Lake.&lt;/LI&gt;&lt;LI&gt;&lt;STRONG&gt;Deduplication&lt;/STRONG&gt;: Use row number, over &amp;amp; partition by (over (partition by id order by timestamp desc) = 1) in your code before writing to the tables&lt;/LI&gt;&lt;LI&gt;&lt;STRONG&gt;Check Constraints&lt;/STRONG&gt;: While Primary &amp;amp; Foreign Keys are not enforced, NOT NULL and CHECK constraints are strictly enforced.&lt;/LI&gt;&lt;/UL&gt;</description>
      <pubDate>Thu, 09 Apr 2026 14:29:07 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/primary-key-constraint-not-working/m-p/153933#M54030</guid>
      <dc:creator>balajij8</dc:creator>
      <dc:date>2026-04-09T14:29:07Z</dc:date>
    </item>
    <item>
      <title>Re: Primary key constraint not working</title>
      <link>https://community.databricks.com/t5/data-engineering/primary-key-constraint-not-working/m-p/153949#M54035</link>
      <description>&lt;P&gt;Thanks&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/175319"&gt;@Sumit_7&lt;/a&gt;&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/210897"&gt;@balajij8&lt;/a&gt;&amp;nbsp;for the clarification. I also read the same about primary keys and foreign keys, however got confused when I saw the table Overview. It distinctly highlights PK for customer_id. Please check the screenshot. Also, I was getting errors for other tables when they ran before customers table- 'primary key doesn't exist in customers table.'&amp;nbsp;&lt;/P&gt;&lt;P&gt;What does informational mean?&lt;/P&gt;</description>
      <pubDate>Thu, 09 Apr 2026 15:53:10 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/primary-key-constraint-not-working/m-p/153949#M54035</guid>
      <dc:creator>AanchalSoni</dc:creator>
      <dc:date>2026-04-09T15:53:10Z</dc:date>
    </item>
    <item>
      <title>Re: Primary key constraint not working</title>
      <link>https://community.databricks.com/t5/data-engineering/primary-key-constraint-not-working/m-p/153951#M54036</link>
      <description>&lt;P&gt;Hi,&lt;/P&gt;
&lt;P&gt;If you using declarative pipelines then databricks respects the definition that you gave on the table declaration. If you subsequently do an alter table statement in the same pipeline it won't preserve it.&lt;/P&gt;
&lt;P&gt;What you should do is either set the primary key in the declarative pipeline table definition or put the alter table as an additional task in the job. Although Databricks doesn't enforce primary key constraints it should still be aware of them.&lt;/P&gt;
&lt;LI-CODE lang="python"&gt;CREATE OR REFRESH STREAMING TABLE orders_valid(
  CONSTRAINT pk_customer_id PRIMARY KEY (customer_id)
)
AS SELECT * FROM STREAM read_files("/databricks-datasets/retail-org/sales_orders");&lt;/LI-CODE&gt;
&lt;P&gt;Thanks,&lt;BR /&gt;&lt;BR /&gt;Emma&lt;/P&gt;</description>
      <pubDate>Thu, 09 Apr 2026 15:53:49 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/primary-key-constraint-not-working/m-p/153951#M54036</guid>
      <dc:creator>emma_s</dc:creator>
      <dc:date>2026-04-09T15:53:49Z</dc:date>
    </item>
    <item>
      <title>Re: Primary key constraint not working</title>
      <link>https://community.databricks.com/t5/data-engineering/primary-key-constraint-not-working/m-p/153952#M54037</link>
      <description>&lt;P&gt;Thanks&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/176516"&gt;@emma_s&lt;/a&gt;&amp;nbsp;! This helps!&lt;/P&gt;</description>
      <pubDate>Thu, 09 Apr 2026 15:57:07 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/primary-key-constraint-not-working/m-p/153952#M54037</guid>
      <dc:creator>AanchalSoni</dc:creator>
      <dc:date>2026-04-09T15:57:07Z</dc:date>
    </item>
    <item>
      <title>Re: Primary key constraint not working</title>
      <link>https://community.databricks.com/t5/data-engineering/primary-key-constraint-not-working/m-p/153957#M54042</link>
      <description>&lt;P&gt;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/184264"&gt;@AanchalSoni&lt;/a&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Capturing the columns as Primary key helps users and tools understand relationships in the data. You can create Primary Key with RELY for optimization in some cases by skipping redundant operations.&lt;/P&gt;&lt;UL&gt;&lt;LI&gt;&lt;P&gt;&lt;STRONG&gt;Distinct Elimination&lt;/STRONG&gt;&lt;BR /&gt;When you apply a DISTINCT operator to a column marked as a PRIMARY KEY with RELY, the optimizer knows every value is already unique. It skips the expensive shuffle and sort required to get it.&lt;/P&gt;&lt;P&gt;SELECT DISTINCT p_product_id FROM products will be treated&amp;nbsp;as a SELECT p_product_id FROM products, removing the overhead&lt;/P&gt;&lt;/LI&gt;&lt;LI&gt;&lt;P&gt;&lt;STRONG&gt;Join Elimination&lt;/STRONG&gt;&lt;BR /&gt;In a LEFT OUTER JOIN where you only select columns from the Fact table, the optimizer can eliminate the join if it knows the Dimension table is joined on a unique primary key.&lt;/P&gt;&lt;PRE&gt;SELECT SUM(oi_sales_price) &lt;BR /&gt;FROM order_items oi&lt;BR /&gt;LEFT JOIN promotions p ON oi.promo_id = p.p_promo_id;&lt;/PRE&gt;&lt;PRE&gt;If p_promo_id is a RELY primary key, the optimizer realizes the join doesn't change the SUM result. It executes only the scan on the Fact table - SELECT SUM(oi_sales_price) FROM order_items oi;&lt;/PRE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;/LI&gt;&lt;/UL&gt;</description>
      <pubDate>Thu, 09 Apr 2026 16:15:08 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/primary-key-constraint-not-working/m-p/153957#M54042</guid>
      <dc:creator>balajij8</dc:creator>
      <dc:date>2026-04-09T16:15:08Z</dc:date>
    </item>
    <item>
      <title>Re: Primary key constraint not working</title>
      <link>https://community.databricks.com/t5/data-engineering/primary-key-constraint-not-working/m-p/153960#M54044</link>
      <description>&lt;P&gt;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/210897"&gt;@balajij8&lt;/a&gt;&amp;nbsp;thanks again!&lt;/P&gt;&lt;P&gt;The explanation totally makes sense now.&lt;/P&gt;</description>
      <pubDate>Thu, 09 Apr 2026 16:20:58 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/primary-key-constraint-not-working/m-p/153960#M54044</guid>
      <dc:creator>AanchalSoni</dc:creator>
      <dc:date>2026-04-09T16:20:58Z</dc:date>
    </item>
  </channel>
</rss>

