07-13-2023 09:49 PM - edited 07-13-2023 09:52 PM
Hello,
I have some data which are lying into Snowflake, so I want to apply CDC on them using delta live table but I am having some issues.
Here is what I am trying to do:
@dlt.view()
def table1():
return spark.read.format("snowflake").options(**options).option('query', query).load()
def.create_streaming_table(target)
dlt.apply_changes(
source = 'table1'
target = 'target'
....
)
The same code run well if I am reading a delta table but if its snowflake am having the following error
'org.apache.spark.sql.AalysisException: Source data for the APPLY CHANGES target 'XXXXX' must be a streaming query'
Is there a solution or a workaround you can help me with?
07-17-2023 12:43 AM
The CDC for delta live works fine for delta tables, as you have noticed. However it is not a full blown CDC implementation/software.
If you want to capture changes in Snowflake, you will have to implement some CDC method on Snowflake itself, and read those changes into Databricks.
There are several approaches to this, like using Snowflake Streams
or a commercial CDC software.
Depending on your scenario, you will also have to put some event queue between Databricks and Snowflake (like Kafka or Pulsar or ...).
07-23-2023 09:45 PM - edited 07-24-2023 07:59 AM
Ok I got the point and thank you for your respond.
So here is how my data is organised
I should be working with the table1, but as it grows fast and I can't always load it into databricks anytime in a materialised table, the idea were
What do you think can be the best approach in this case if we are working with dlt?
07-26-2023 11:25 PM
Finally I followed steps from this blog, and everything works fine.
I just assumed that I have tables as sources and not flat files.
Happy reading!
03-21-2024 11:11 AM
Hi @Khalil ,
Can you share if you worked on Unity catalog or HMS?
03-23-2024 06:43 PM
Hi @data-engineer-d ,
I am using HMS, but at the same time I am presently experimenting UC as We are planning to use for a good data management.
Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.
If there isn’t a group near you, start one and help create a community that brings people together.
Request a New Group