topic Re: Delta live table: Retrieve CDF columns in Data Engineering

Delta live table: Retrieve CDF columns

LauJohansson — Mon, 02 Sep 2024 12:42:56 GMT

I have want to use the apply_changes feature from a bronze table to a silver table.The bronze table have no "natural" sequence_by column. Therefore, I want to use the CDF column "_commit_timestamp" as the sequence_by.

How do I retrieve the columns in a dlt setup?

Re: Delta live table: Retrieve CDF columns

raphaelblg — Tue, 03 Sep 2024 18:54:13 GMT

Hi @LauJohansson , Here's an example source code for a DLT pipeline that reads from a bronze table in CDF and uses the apply_changes function to upsert to your silver table, sequencing the upsert rows in order by the _commit_timestamp column for the latest row-level changes:

import dlt from pyspark.sql.functions import col, expr source = f"catalog.schema.cdf_enabled_table" bronze = "cdf_enabled_table_sequency_by_bronze" silver = "cdf_enabled_table_sequency_by_silver" @dlt.view( name=bronze, ) def cdf_enabled_table_sequency_by_bronze(): return spark.readStream.option("readChangeFeed", "true").table(source) dlt.create_streaming_table(name=silver) dlt.apply_changes( target = silver, source = bronze, keys = ["id"], sequence_by = col("_commit_timestamp"), stored_as_scd_type = 1 )

It seems that you're using this setup for data deduplication, it should work but please consider the APPLY_CHANGES limitations and CDF limitations while designing your pipeline.

Re: Delta live table: Retrieve CDF columns

raphaelblg — Tue, 03 Sep 2024 19:47:37 GMT

Delta Live Tables Python language reference: https://docs.databricks.com/en/delta-live-tables/python-ref.html

Delta Live Tables SQL language reference: https://docs.databricks.com/en/delta-live-tables/sql-ref.html

Re: Delta live table: Retrieve CDF columns

LauJohansson — Wed, 04 Sep 2024 09:48:39 GMT

Thank you @raphaelblg!

I chose to write an article on the subject after this discussion: https://www.linkedin.com/pulse/databricks-delta-live-tables-merging-lau-johansson-cdtce/?trackingId=L872gj0yQouXgJudM75gdw%3D%3D