<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: CDF metadata columns are lost after importing dlt in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/cdf-metadata-columns-are-lost-after-importing-dlt/m-p/119057#M45780</link>
    <description>&lt;P&gt;Hi Lou,&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;Thank you for the explanation!&lt;/SPAN&gt;&lt;SPAN&gt;&amp;nbsp;In my case, I was reading a CDF table outside the &lt;/SPAN&gt;&lt;SPAN&gt;dlt&lt;/SPAN&gt;&lt;SPAN&gt;&amp;nbsp;pipeline, and I need to import some functions from our shared ETL modules, which imports the &lt;/SPAN&gt;&lt;SPAN&gt;dlt&lt;/SPAN&gt;&lt;SPAN&gt;&amp;nbsp;library. However, the behaviour gets altered by simply having the import, even without using any &lt;/SPAN&gt;&lt;SPAN&gt;dlt&lt;/SPAN&gt;&lt;SPAN&gt;&amp;nbsp;functionality.&lt;/SPAN&gt;&lt;/DIV&gt;&lt;/DIV&gt;</description>
    <pubDate>Tue, 13 May 2025 14:59:08 GMT</pubDate>
    <dc:creator>Ru</dc:creator>
    <dc:date>2025-05-13T14:59:08Z</dc:date>
    <item>
      <title>CDF metadata columns are lost after importing dlt</title>
      <link>https://community.databricks.com/t5/data-engineering/cdf-metadata-columns-are-lost-after-importing-dlt/m-p/118713#M45691</link>
      <description>&lt;P&gt;Hi&amp;nbsp;&lt;SPAN&gt;Databricks Community&lt;/SPAN&gt;,&lt;/P&gt;&lt;DIV&gt;&lt;DIV&gt;&amp;nbsp;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;I attempted to read the Change Feed from a CDF-enabled table. Initially, the correct table schema, including the metadata columns (&lt;/SPAN&gt;&lt;SPAN&gt;_change_type&lt;/SPAN&gt;&lt;SPAN&gt;, &lt;/SPAN&gt;&lt;SPAN&gt;_commit_version&lt;/SPAN&gt;&lt;SPAN&gt;, and &lt;/SPAN&gt;&lt;SPAN&gt;_commit_timestamp&lt;/SPAN&gt;&lt;SPAN&gt;), was returned as expected. However, after importing the &lt;/SPAN&gt;&lt;SPAN&gt;dlt&lt;/SPAN&gt;&lt;SPAN&gt;&amp;nbsp;library and reading the changes again, the metadata columns were missing. Could you help me resolve this issue? Thank you in advance!&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&amp;nbsp;&lt;/DIV&gt;&lt;/DIV&gt;&lt;LI-CODE lang="python"&gt;# Databricks notebook source
changeset_cols_before = (
    spark.read
    .option("readChangeFeed", "true")
    .option("startingVersion", 0)
    .table("&amp;lt;path_of_CDF_enabled_table&amp;gt;")
    .columns
)

# COMMAND ----------

import dlt

# COMMAND ----------

changeset_cols_after = (
    spark.read
    .option("readChangeFeed", "true")
    .option("startingVersion", 0)
    .table("&amp;lt;path_of_CDF_enabled_table&amp;gt;")
    .columns
)

# COMMAND ----------

missing_cols = [col for col in changeset_cols_before if col not in changeset_cols_after]
print(missing_cols)

# result: ['_change_type', '_commit_version', '_commit_timestamp']&lt;/LI-CODE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Fri, 09 May 2025 16:47:43 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/cdf-metadata-columns-are-lost-after-importing-dlt/m-p/118713#M45691</guid>
      <dc:creator>Ru</dc:creator>
      <dc:date>2025-05-09T16:47:43Z</dc:date>
    </item>
    <item>
      <title>Re: CDF metadata columns are lost after importing dlt</title>
      <link>https://community.databricks.com/t5/data-engineering/cdf-metadata-columns-are-lost-after-importing-dlt/m-p/118715#M45692</link>
      <description>&lt;DIV class="paragraph"&gt;The issue stems from the interaction between the Change Data Feed (CDF) metadata columns (_change_type, _commit_version, _commit_timestamp) and the Delta Live Tables (DLT) library. After you import the &lt;CODE&gt;dlt&lt;/CODE&gt; module, the behavior of reading the CDF-enabled table changes, resulting in the absence of the metadata columns upon read.&lt;/DIV&gt;
&lt;DIV class="paragraph"&gt;To address this issue: 1. &lt;STRONG&gt;Understanding the Cause&lt;/STRONG&gt;: By default, DLT pipelines enable CDF for better propagation of change data. However, when importing DLT, if the target table also contains columns that are reserved for CDF (_change_type, _commit_version, _commit_timestamp), the framework can skip exposing these reserved metadata columns due to conflicts or internal handling, as outlined in relevant documentation.&lt;/DIV&gt;
&lt;OL start="2"&gt;
&lt;LI&gt;
&lt;DIV class="paragraph"&gt;&lt;STRONG&gt;Best Practice Adjustments&lt;/STRONG&gt;:
&lt;UL&gt;
&lt;LI&gt;Use the &lt;CODE&gt;except_column_list&lt;/CODE&gt; parameter in &lt;CODE&gt;dlt.apply_changes()&lt;/CODE&gt; or filter out the columns explicitly in your code when dealing with append-only streaming tables. For example: &lt;CODE&gt;python
@dlt.table
def my_table():
    df = (
        spark.read
        .option("readChangeFeed", "true")
        .option("startingVersion", 0)
        .table("&amp;lt;path_of_CDF_enabled_table&amp;gt;")
    )
    return df.drop("_change_type", "_commit_version", "_commit_timestamp")
&lt;/CODE&gt;.&lt;/LI&gt;
&lt;/UL&gt;
&lt;/DIV&gt;
&lt;DIV class="paragraph"&gt;This drops these reserved metadata columns from the read DataFrame, mitigating the problem.&lt;/DIV&gt;
&lt;/LI&gt;
&lt;LI&gt;
&lt;DIV class="paragraph"&gt;&lt;STRONG&gt;Schema Management&lt;/STRONG&gt;: Ensure these reserved column names are excluded or renamed in the source table when CDF is enabled, as conflicting column names can lead to ambiguity.&lt;/DIV&gt;
&lt;/LI&gt;
&lt;LI&gt;
&lt;DIV class="paragraph"&gt;&lt;STRONG&gt;General Steps&lt;/STRONG&gt;:
&lt;UL&gt;
&lt;LI&gt;Perform the initial read before importing DLT and save the schema if required for downstream operations.&lt;/LI&gt;
&lt;LI&gt;Post-import, reconfigure your read logic to accommodate the absence of the columns or filter them out explicitly.&lt;/LI&gt;
&lt;/UL&gt;
&lt;/DIV&gt;
&lt;/LI&gt;
&lt;/OL&gt;
&lt;DIV class="paragraph"&gt;Hope this helps, Lou.&lt;/DIV&gt;</description>
      <pubDate>Fri, 09 May 2025 17:52:43 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/cdf-metadata-columns-are-lost-after-importing-dlt/m-p/118715#M45692</guid>
      <dc:creator>Louis_Frolio</dc:creator>
      <dc:date>2025-05-09T17:52:43Z</dc:date>
    </item>
    <item>
      <title>Re: CDF metadata columns are lost after importing dlt</title>
      <link>https://community.databricks.com/t5/data-engineering/cdf-metadata-columns-are-lost-after-importing-dlt/m-p/119057#M45780</link>
      <description>&lt;P&gt;Hi Lou,&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;Thank you for the explanation!&lt;/SPAN&gt;&lt;SPAN&gt;&amp;nbsp;In my case, I was reading a CDF table outside the &lt;/SPAN&gt;&lt;SPAN&gt;dlt&lt;/SPAN&gt;&lt;SPAN&gt;&amp;nbsp;pipeline, and I need to import some functions from our shared ETL modules, which imports the &lt;/SPAN&gt;&lt;SPAN&gt;dlt&lt;/SPAN&gt;&lt;SPAN&gt;&amp;nbsp;library. However, the behaviour gets altered by simply having the import, even without using any &lt;/SPAN&gt;&lt;SPAN&gt;dlt&lt;/SPAN&gt;&lt;SPAN&gt;&amp;nbsp;functionality.&lt;/SPAN&gt;&lt;/DIV&gt;&lt;/DIV&gt;</description>
      <pubDate>Tue, 13 May 2025 14:59:08 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/cdf-metadata-columns-are-lost-after-importing-dlt/m-p/119057#M45780</guid>
      <dc:creator>Ru</dc:creator>
      <dc:date>2025-05-13T14:59:08Z</dc:date>
    </item>
  </channel>
</rss>

