12-01-2023 04:29 AM
Up until yesterday I was (sort of) able to read changes from target tables of apply changes operations (either through tables_changes() or using readChangeFeed). I say sort of because the meta columns (_change_type, _commit_version, _commit_timestamp) were missing from readChangeFeed but since there were no deletes on my source table I could read upserts just fine. I'm using a pattern similar to this message on stackoverflow: Databricks Delta Live Tables - Apply Changes from delta table
As of today I'm getting
AnalysisException: [STREAMING_TABLE_OPERATION_NOT_ALLOWED.APPLY_CHANGES_UNSUPPORTED_OPERATION] The operation SELECT CHANGE DATA FEED is not allowed: The operation is not supported on APPLY CHANGES Streaming Tables.
I guess I should have taken the fact that meta columns were missing as indication that I was doing something wrong. How do you read changes from APPLY CHANGES Streaming Tables?
12-03-2023 08:09 AM
Hi @afk, It seems you’ve been navigating the intricacies of Databricks Delta Live Tables and Change Data Capture (CDC). Let’s unravel this together!
Change Data Capture (CDC):
Delta Live Tables (DLT):
APPLY CHANGES INTO:
The new APPLY CHANGES INTO operation in DLT pipelines automatically and seamlessly handles out-of-order data without manual intervention.
It simplifies your architecture, making it more efficient and scalable.
You can propagate changes to downstream target tables using the APPLY CHANGES INTO statement.
Before executing the query, ensure that the target streaming table exists to hold the most up-to-date data.
Here’s an example of creating a target streaming table:
Reading Changes:
Important Note:
Keep exploring the fascinating world of data pipelines, and may your streams flow smoothly! 🌊🔍🚀
12-03-2023 09:15 AM
Thanks a lot @Kaniz_Fatma for providing more context for my question. Yes, I've been using CDC in a DLT pipeline and was reading changes to a table from one stage (say, bronze) and applying those to the next stage (say, silver) using APPLY CHANGES INTO. So the target of an APPLY CHANGES INTO operation becomes the source for a change data feed (readChangeFeed) for the next stage. Your post quotes the documentation (which I was also familiar with) in pointing out that "The target of the APPLY CHANGES INTO query cannot be used as a source for a streaming table" so I was using a materialized view to read from that target as recommended.
The error I've been getting since Friday (see my first post, it's a new error, the pipeline was working before) seems to indicate I can't read change data using CDC from a target table of an APPLY CHANGES INTO operation anymore. Am I reading this right?
If so, is an APPLY CHANGES INTO supposed to be the last operation in every pipeline (applied only to my gold tables, essentially)? And if that's the case, what's best practice with propagating changes across stages using DLT?
12-03-2023 09:14 PM
Hi @afk, Thank you for providing context about your DLT pipeline and your challenges with CDC. Let’s dive into the details.
CDC and APPLY CHANGES INTO:
Error Since Friday:
Best Practices for Propagating Changes:
Remember that DLT pipelines can be complex, and the right design depends on your use case.
Feel free to share more details about the error message or any other challenges you’re facing, and we can explore further solutions! 🚀
12-04-2023 12:30 AM
Thanks @Kaniz_Fatma for your response. I'm still stuck. How do I "apply [changes] to the silver layer" in a DLT pipeline without using APPLY CHANGES INTO (which I'm reserving for my gold layer)? The only way I see is to revert to using MERGE INTO and foreachBatch.
Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.
If there isn’t a group near you, start one and help create a community that brings people together.
Request a New Group