Databricks Community

Puspak · ‎01-30-2025

I was trying t read CDF data of a table as a DLT materialized view.

It works fine with sql syntax reading all the columns of the source table along with the 3 CDF columns : _change_type,_commit_timestamp,_commit_version:

@dlt.table()

def change_table():

df_change = spark.sql("SELECT * FROM table_changes(<source_table_name>,1)")

return(df_change)

But when I try the same with python it just reads the columns of the source table leaving out the CDF columns : _change_type,_commit_timestamp,_commit_version:

@dlt.table()

def change_table():

df_change = spark.read.option('readChangeFeed','True').option('startingVersion',1).table(<source_table_name>)

return(df_change)

Puspak · ‎01-30-2025

But the same python code works fine when executed outside of a DLT pipeline. When I run the following in an interactive notebook it returns the source columns + CDF columns, which is logical because I am using the readChangeFeed option while reading.

spark.read.option('readChangeFeed','True').option('startingVersion',1).table(<source_table_name>)

The problem I stated occurs only when it is executed within a DLT pipeline which is strange.

Databricks Community

DLT behaving differently when used with python syntax vs when used with sql syntax to read CDF

Photos

Join Us as a Local Community Builder!

Business Intelligence in the Era of AI

🚀 Monthly Databricks Get Started Days – Accelerate Your Learning Journey! 🚀

Databricks Community Champion - March 2025 - Takuya Omi

Get Started With Lakehouse Architecture | Pass a quiz to earn your certificate completion.

Virtual Learning Festival: 9 April - 30 April