Databricks Community

Pw76 · ‎06-24-2025

I am trying to use create_auto_cdc_from_snapshot_flow (formerly apply_changes_from_snapshot()) (see: https://docs.databricks.com/aws/en/dlt/cdc#cdc-from-snapshot)

I am attempting to do SCD type 2 changes using historic snapshot data.

In the first couple steps of my process I need to parse JSON, then explode and flatten it. Back in storage I basically and getting a single json file every day that represents an entire table from my source system.

I am running into issues trying to get the CDC process to work as described in the documentation.

1. While trying to implement the "next_snapshot_and_version" function in order to do this, I am running into issues trying to reference previous tables (or views) in my medallion structure that are needed to create the 1st element of output of the function which is the DataFrame representing the snapshot. I am getting the following error whenever I try to reference any of the other tables I have created in my process:

[REFERENCE_DLT_DATASET_OUTSIDE_QUERY_DEFINITION] Referencing DLT dataset [table I am trying to access in fully qualified form] outside the dataset query definition (i.e., @Dlt.table annotation) is not supported. Please read it instead inside the dataset query definition.

I've only been able to find a few examples of the implementation of the "next_snapshot_and_version" function, but in each case they seem to be working with the dataframe based upon storage level data and not other DLT tables/views. I am not sure if this is a limitation of this approach, and that I am not able to reference a DLT table inside my implementation of the required "next_snapshot_and_version" function. I do however need to first parse/explode/flatten the json data, so going back to the storage data is not an option for me.

Does anyone have any suggestions or examples that could help me or know whether my approach is incorrect? Thank you.

nkarwa · ‎07-17-2025

@Pw76 - I was wondering if you found a solution? I have a similar use-case. I want to create a archive/snapshot table using DLT from a non-streaming (MV) and a Backfill table (1 time). I thought create_auto_cdc_from_snapshot_flow could help here, but no luck.

Pw76 · ‎07-18-2025

So far have not found a solution to this. I was however able to prove that this works for me by not having the DLT tables I am accessing to define my snapshot in the same pipeline. The snapshot and the next_snapshot function I had to write worked and gave the resulting targe table I wanted. It now just seems when i try to put it altogether it chokes. Also, I've found that unless you have the scenario where you need to have your target table recognize that records are no longer in your source snapshot, you can just use the regular AUTO CDC (without snapshot), and it is much easier and you dont need to try to implement the problematic next_snapshot function.