topic I have created a DLT pipeline which reads data from json files which are stored in databricks volum in Data Engineering

I have created a DLT pipeline which reads data from json files which are stored in databricks volum

zero234 — Tue, 20 Feb 2024 13:20:33 GMT

I have created a DLT pipeline which reads data from json files which are stored in databricks volume and puts data into streaming table
This was working fine.
when i tried to read the data that is inserted into the table and compare the values with the precalculated ones in the same dlt pipline its failing.

is it because dlt is treating this as an initialization stage and executing these comparism before setting up tables or inserting data into tables

Re: I have created a DLT pipeline which reads data from json files which are stored in databricks v

Palash01 — Wed, 21 Feb 2024 02:01:31 GMT

Hey @zero234

Yes, your assumption looks aligned to mine your pipeline reads data from JSON files, inserts it into a streaming table, and then tries to compare values in the table with pre-calculated values before any data has been written. This leads to a comparison with an empty table, resulting in the error.

Possible Solution:

Don't perform the comparison within the same notebook as table creation. Create a separate notebook or trigger that runs after the table has received data. This ensures comparison happens only when there's actual data to compare. You can also set this using a spark job which can help you trigger the dlt pipeline first and comparison afterwards.
Before comparing, modify the comparison code to explicitly check if the streaming table has received any data. You can use table.isEmpty() or similar logic to confirm if there's data before proceeding.

Re: I have created a DLT pipeline which reads data from json files which are stored in databricks v

AmanSehgal — Wed, 21 Feb 2024 12:24:47 GMT

Keep your DLT code separate from your comparison code, and run your comparison code once your DLT data has been ingested.