When accessing Change Data Feed (CDF) data in Delta Live Tables (DLT), the behavior between SQL and Python APIs differs notably regarding CDF metadata columns—_change_type, _commit_timestamp, and _commit_version.
-
SQL Approach (using table_changes):
The SQL syntax
SELECT * FROM table_changes(<source_table_name>, 1)
always returns all columns from the source table plus the three CDF columns.
-
Python API Approach (using .option('readChangeFeed','True')):
When you use
spark.read.option('readChangeFeed','True').option('startingVersion', 1).table(<source_table_name>)
without explicit CDF columns referenced, often the returned dataframe omits the three CDF columns, showing only the data columns from the source table. This is a known difference in behavior attributed to the underlying CDF implementation and how schema inference is treated in the Python API.
How to Access CDF Columns in Python
To ensure the CDF columns are included when reading from a Delta table using the Python API, explicitly reference the relevant columns:
df_change = spark.read.option('readChangeFeed','True').option('startingVersion', 1).table(<source_table_name>)
df_change = df_change.select("*") # In some environments, this reveals the CDF columns
# Alternatively, explicitly select the columns if "*" doesn't work:
df_change = df_change.select(
"*",
"_change_type",
"_commit_timestamp",
"_commit_version"
)
If the columns still do not appear after using .select("*"), use .select() naming the columns directly as above. Ensure your environment is running on Databricks Runtime 9.0 or newer and that CDF is properly enabled for the Delta table.
Workaround and Best Practices
-
Use the SQL interface (table_changes) if you need a direct, complete result including CDF columns in DLT materialized views.
-
For Python, verify the dataframe schema with df_change.printSchema()—if CDF columns are missing, switch to explicit .select() or fallback to using a SQL query within Python (spark.sql("SELECT * FROM table_changes(...)")).
-
This behavior is subject to Databricks and Delta Lake updates; always refer to the current documentation for feature changes.
Quick Table Comparison
| Method |
Data Columns |
CDF Columns |
table_changes (SQL) |
Yes |
Yes |
spark.read option (Python API) |
Yes |
Usually No |
To reliably capture CDF columns in Python, stick to SQL-based approaches or check your DataFrame schema and select columns explicitly. This is a well-documented limitation and workaround in Databricks.