I am trying to use create_auto_cdc_from_snapshot_flow (formerly apply_changes_from_snapshot()) (see: https://docs.databricks.com/aws/en/dlt/cdc#cdc-from-snapshot)
I am attempting to do SCD type 2 changes using historic snapshot data.
In the first couple steps of my process I need to parse JSON, then explode and flatten it. Back in storage I basically and getting a single json file every day that represents an entire table from my source system.
I am running into issues trying to get the CDC process to work as described in the documentation.
1. While trying to implement the "next_snapshot_and_version" function in order to do this, I am running into issues trying to reference previous tables (or views) in my medallion structure that are needed to create the 1st element of output of the function which is the DataFrame representing the snapshot. I am getting the following error whenever I try to reference any of the other tables I have created in my process:
[REFERENCE_DLT_DATASET_OUTSIDE_QUERY_DEFINITION] Referencing DLT dataset [table I am trying to access in fully qualified form] outside the dataset query definition (i.e., @Dlt.table annotation) is not supported. Please read it instead inside the dataset query definition.
I've only been able to find a few examples of the implementation of the "next_snapshot_and_version" function, but in each case they seem to be working with the dataframe based upon storage level data and not other DLT tables/views. I am not sure if this is a limitation of this approach, and that I am not able to reference a DLT table inside my implementation of the required "next_snapshot_and_version" function. I do however need to first parse/explode/flatten the json data, so going back to the storage data is not an option for me.
Does anyone have any suggestions or examples that could help me or know whether my approach is incorrect? Thank you.