Databricks Community

Puspak · ‎01-30-2025

I was trying t read CDF data of a table as a DLT materialized view.

It works fine with sql syntax reading all the columns of the source table along with the 3 CDF columns : _change_type,_commit_timestamp,_commit_version:

@dlt.table()

def change_table():

df_change = spark.sql("SELECT * FROM table_changes(<source_table_name>,1)")

return(df_change)

But when I try the same with python it just reads the columns of the source table leaving out the CDF columns : _change_type,_commit_timestamp,_commit_version:

@dlt.table()

def change_table():

df_change = spark.read.option('readChangeFeed','True').option('startingVersion',1).table(<source_table_name>)

return(df_change)

Puspak · ‎01-30-2025

But the same python code works fine when executed outside of a DLT pipeline. When I run the following in an interactive notebook it returns the source columns + CDF columns, which is logical because I am using the readChangeFeed option while reading.

spark.read.option('readChangeFeed','True').option('startingVersion',1).table(<source_table_name>)

The problem I stated occurs only when it is executed within a DLT pipeline which is strange.

Databricks Community

DLT behaving differently when used with python syntax vs when used with sql syntax to read CDF

Join Us as a Local Community Builder!

🚀 Weekly Delta (8 - 14 October): A Look Back at This Week’s Top Community Highlights

Databricks Community Champion - September 2025 - Nayanjyoti Sonowal

🌟 Community Sparks of the Week | September 26 – October 2 🌟

Solution Accelerator Series | #4 - Toxicity Detection for Gaming

Level Up with Databricks Specialist Sessions