Databricks Community

GURUPRASAD · ‎03-08-2023

Hi All,

I'm new to databricks and learning towards taking up Associate Engineer Certification.

While going through the section "Build Data Pipelines with Delta Live Tables".

I'm trying to implement Change Data Capture, but it is erroring out when executing the workflow.

'm not sure if my code is incorrect as It is similar to what we have in the course material. Please see details below and kindly let me know how to fix this.

Screenshot of the Notebook used in the definition of the Pipeline.

Scroll down for the Code text and Error Text.

Code

CREATE OR REFRESH STREAMING LIVE TABLE SCD2_RAW

AS select current_timestamp() load_time, right(input_file_name(),13) source_file, * from json.`dbfs:/FileStore/tables/J_File_1.json`;

CREATE OR REFRESH STREAMING LIVE TABLE SCD2_SILVER;

APPLY CHANGES INTO LIVE.SCD2_SILVER

FROM STREAM(LIVE.SCD2_RAW)

KEYS (userid)

SEQUENCE BY load_time

COLUMNS * EXCEPT (load_time, source_file);

-- STORED AS SCD TYPE 1

-- TRACK HISTORY ON (userid, name, city);

Error

org.apache.spark.sql.AnalysisException: 'SCD2_RAW' is a streaming table, but 'SCD2_RAW' was not read as a stream. Either remove the STREAMING keyword after the CREATE clause or read the input as a stream rather than a table.

Thanks

Kearon · ‎03-09-2023

Having had a quick look, I think your error is because you are trying to add SCD to a STREAMING LIVE table. I believe APPLY CHANGES INTO cannot be used on a streaming table.

You can use a streaming table as a source though.

Simply changing this line:

CREATE OR REFRESH STREAMING LIVE TABLE SCD2_SILVER;

to:

CREATE OR REFRESH LIVE TABLE SCD2_SILVER;

should be sufficient.

Do make sure you are running a compatible version of databricks. Also, if you want to use Track History, you need to set the pipeline cluster config:

pipelines.enableTrackHistory

to true.

I've also found the databricks SQL parser to give syntax errors sometimes and a little experimenting with removing line breaks, etc. can sometimes help track down errors.

GURUPRASAD · ‎03-09-2023

Thank you Kearon for taking time to answer.

I tried implementing the change suggested but seeing a different error now.

Error :

org.apache.spark.sql.AnalysisException: Unsupported SQL statement for table 'SCD2_SILVER': Missing query is not supported.

Modified Code: