<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Identity column has null values in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/identity-column-has-null-values/m-p/149000#M53002</link>
    <description>&lt;P&gt;&lt;SPAN&gt;&lt;A href="https://docs.databricks.com/aws/en/ldp/limitations" target="_self"&gt;Identity columns have certain limitations&lt;/A&gt; with SDP . Here is what the documentation says .&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;&lt;UL&gt;&lt;LI&gt;Identity columns are not supported with tables that are the target of&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;A class="" href="https://docs.databricks.com/aws/en/ldp/cdc" target="_blank" rel="noopener"&gt;AUTO CDC&lt;/A&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;processing.&lt;/LI&gt;&lt;/UL&gt;&lt;P&gt;Because identity columns aren’t supported on targets of AUTO CDC (create_auto_cdc_from_snapshot_flow/APPLY CHANGES), the engine won’t generate values during the SCD2 updates—hence NULLs in the identity column.&lt;/P&gt;&lt;P&gt;If you can, use a deterministic surrogate key (for example, a hash of the natural/business key) so SCD2 can stay stable across merges&lt;/P&gt;</description>
    <pubDate>Sun, 22 Feb 2026 23:03:07 GMT</pubDate>
    <dc:creator>pradeep_singh</dc:creator>
    <dc:date>2026-02-22T23:03:07Z</dc:date>
    <item>
      <title>Identity column has null values</title>
      <link>https://community.databricks.com/t5/data-engineering/identity-column-has-null-values/m-p/148794#M52972</link>
      <description>&lt;P&gt;I want to update a dimension table in the gold model from a silver table by using&amp;nbsp; create_auto_cdc_from_snapshot_flow and SCD2. In the target table, I have defined an IDENTITY column, which should be populated automatically.&lt;/P&gt;&lt;P&gt;The dlt flow runs successfully, but then all the values in the identity column are NULLs. Why?!&lt;/P&gt;</description>
      <pubDate>Thu, 19 Feb 2026 11:28:00 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/identity-column-has-null-values/m-p/148794#M52972</guid>
      <dc:creator>yit337</dc:creator>
      <dc:date>2026-02-19T11:28:00Z</dc:date>
    </item>
    <item>
      <title>Re: Identity column has null values</title>
      <link>https://community.databricks.com/t5/data-engineering/identity-column-has-null-values/m-p/149000#M53002</link>
      <description>&lt;P&gt;&lt;SPAN&gt;&lt;A href="https://docs.databricks.com/aws/en/ldp/limitations" target="_self"&gt;Identity columns have certain limitations&lt;/A&gt; with SDP . Here is what the documentation says .&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;&lt;UL&gt;&lt;LI&gt;Identity columns are not supported with tables that are the target of&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;A class="" href="https://docs.databricks.com/aws/en/ldp/cdc" target="_blank" rel="noopener"&gt;AUTO CDC&lt;/A&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;processing.&lt;/LI&gt;&lt;/UL&gt;&lt;P&gt;Because identity columns aren’t supported on targets of AUTO CDC (create_auto_cdc_from_snapshot_flow/APPLY CHANGES), the engine won’t generate values during the SCD2 updates—hence NULLs in the identity column.&lt;/P&gt;&lt;P&gt;If you can, use a deterministic surrogate key (for example, a hash of the natural/business key) so SCD2 can stay stable across merges&lt;/P&gt;</description>
      <pubDate>Sun, 22 Feb 2026 23:03:07 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/identity-column-has-null-values/m-p/149000#M53002</guid>
      <dc:creator>pradeep_singh</dc:creator>
      <dc:date>2026-02-22T23:03:07Z</dc:date>
    </item>
    <item>
      <title>Hi @yit337, The reason your identity column values are NU...</title>
      <link>https://community.databricks.com/t5/data-engineering/identity-column-has-null-values/m-p/150311#M53350</link>
      <description>&lt;P&gt;Hi &lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/210475"&gt;@yit337&lt;/a&gt;,&lt;/P&gt;
&lt;P&gt;The reason your identity column values are NULL is that the target table created by create_auto_cdc_from_snapshot_flow is a streaming table, and streaming tables do not support identity columns. This is a documented limitation:&lt;/P&gt;
&lt;P&gt;&lt;A href="https://docs.databricks.com/en/ldp/developer/ldp-sql-ref-create-streaming-table.html#limitations" target="_blank"&gt;https://docs.databricks.com/en/ldp/developer/ldp-sql-ref-create-streaming-table.html#limitations&lt;/A&gt;&lt;/P&gt;
&lt;P&gt;The limitation states: "Generated columns, identity columns, and default columns are not supported" on streaming tables.&lt;/P&gt;
&lt;P&gt;When you use create_auto_cdc_from_snapshot_flow (or any of the AUTO CDC / apply_changes APIs), the target must be a streaming table created with dlt.create_streaming_table(). Since identity columns are not supported on that table type, the column exists in the schema but never gets auto-populated, which results in NULLs.&lt;/P&gt;
&lt;P&gt;WORKAROUND OPTIONS&lt;/P&gt;
&lt;P&gt;1. USE A SURROGATE KEY FUNCTION INSTEAD&lt;/P&gt;
&lt;P&gt;Replace the identity column with a deterministic surrogate key. You can generate a unique key using one of these approaches:&lt;/P&gt;
&lt;PRE&gt;-- Option A: hash-based key from your business keys
SELECT md5(concat_ws('|', key_col1, key_col2)) AS surrogate_key, ...

-- Option B: monotonically_increasing_id() in a downstream step
SELECT monotonically_increasing_id() AS surrogate_key, ...&lt;/PRE&gt;
&lt;P&gt;You would compute this in your source query or in a downstream materialized view / Delta table that reads from the streaming table.&lt;/P&gt;
&lt;P&gt;2. WRITE THE SCD2 RESULT TO A REGULAR DELTA TABLE AS A DOWNSTREAM STEP&lt;/P&gt;
&lt;P&gt;Keep your create_auto_cdc_from_snapshot_flow pipeline as-is (without the identity column on the streaming table). Then create a separate Delta table with the identity column defined and populate it from the streaming table output:&lt;/P&gt;
&lt;PRE&gt;CREATE TABLE gold.my_dimension (
sk BIGINT GENERATED ALWAYS AS IDENTITY,
business_key STRING,
attribute1 STRING,
-- other columns
__START_AT TIMESTAMP,
__END_AT TIMESTAMP
);

INSERT INTO gold.my_dimension (business_key, attribute1, __START_AT, __END_AT)
SELECT business_key, attribute1, __START_AT, __END_AT
FROM your_streaming_table_output;&lt;/PRE&gt;
&lt;P&gt;Because this is a regular Delta table (not a streaming table), the identity column values will be auto-generated on insert.&lt;/P&gt;
&lt;P&gt;3. USE ROW_NUMBER() IN A MATERIALIZED VIEW&lt;/P&gt;
&lt;P&gt;If you need a sequential surrogate key for your star schema dimension and want it managed within the pipeline, you can create a materialized view downstream:&lt;/P&gt;
&lt;PRE&gt;CREATE OR REFRESH MATERIALIZED VIEW gold.my_dimension AS
SELECT
row_number() OVER (ORDER BY business_key, __START_AT) AS sk,
*
FROM your_streaming_table;&lt;/PRE&gt;
&lt;P&gt;Note that row_number() values may shift if the underlying data changes, so this works best when the data is stable. For a truly stable surrogate key, the hash-based approach (Option 1) or the separate Delta table with an identity column (Option 2) is more reliable.&lt;/P&gt;
&lt;P&gt;SUMMARY&lt;/P&gt;
&lt;P&gt;The core issue is that identity columns and streaming tables are incompatible. Since the AUTO CDC APIs require a streaming table target, the identity column silently produces NULLs. The recommended path is to either use a hash-based surrogate key directly in the pipeline, or add a downstream regular Delta table where the identity column can function as expected.&lt;/P&gt;
&lt;P&gt;For more context, you may also find your related question about streaming tables and Gold layer star schemas relevant:&lt;BR /&gt;
&lt;A href="https://community.databricks.com/t5/data-engineering/are-streaming-tables-suitable-for-gold-layer-star-schema/td-p/148695" target="_blank"&gt;https://community.databricks.com/t5/data-engineering/are-streaming-tables-suitable-for-gold-layer-star-schema/td-p/148695&lt;/A&gt;&lt;/P&gt;
&lt;P&gt;Documentation references:&lt;BR /&gt;
- Streaming table limitations: &lt;A href="https://docs.databricks.com/en/ldp/developer/ldp-sql-ref-create-streaming-table.html#limitations" target="_blank"&gt;https://docs.databricks.com/en/ldp/developer/ldp-sql-ref-create-streaming-table.html#limitations&lt;/A&gt;&lt;BR /&gt;
- Identity columns: &lt;A href="https://docs.databricks.com/en/sql/language-manual/sql-ref-syntax-ddl-create-table-using.html" target="_blank"&gt;https://docs.databricks.com/en/sql/language-manual/sql-ref-syntax-ddl-create-table-using.html&lt;/A&gt;&lt;BR /&gt;
- AUTO CDC from snapshot: &lt;A href="https://docs.databricks.com/en/ldp/cdc.html" target="_blank"&gt;https://docs.databricks.com/en/ldp/cdc.html&lt;/A&gt;&lt;/P&gt;
&lt;P&gt;* This reply used an agent system I built to research and draft this response based on the wide set of documentation I have available and previous memory. I personally review the draft for any obvious issues and for monitoring system reliability and update it when I detect any drift, but there is still a small chance that something is inaccurate, especially if you are experimenting with brand new features.&lt;/P&gt;
&lt;P&gt;If this answer resolves your question, could you mark it as "Accept as Solution"? That helps other users quickly find the correct fix.&lt;/P&gt;</description>
      <pubDate>Mon, 09 Mar 2026 04:23:23 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/identity-column-has-null-values/m-p/150311#M53350</guid>
      <dc:creator>SteveOstrowski</dc:creator>
      <dc:date>2026-03-09T04:23:23Z</dc:date>
    </item>
  </channel>
</rss>

