Databricks Community

zero234 · ‎02-20-2024

I have created a DLT pipeline which reads data from json files which are stored in databricks volume and puts data into streaming table
This was working fine.
when i tried to read the data that is inserted into the table and compare the values with the precalculated ones in the same dlt pipline its failing.

is it because dlt is treating this as an initialization stage and executing these comparism before setting up tables or inserting data into tables

Palash01 · ‎02-20-2024

Hey @zero234

Yes, your assumption looks aligned to mine your pipeline reads data from JSON files, inserts it into a streaming table, and then tries to compare values in the table with pre-calculated values before any data has been written. This leads to a comparison with an empty table, resulting in the error.

Possible Solution:

Don't perform the comparison within the same notebook as table creation. Create a separate notebook or trigger that runs after the table has received data. This ensures comparison happens only when there's actual data to compare. You can also set this using a spark job which can help you trigger the dlt pipeline first and comparison afterwards.
Before comparing, modify the comparison code to explicitly check if the streaming table has received any data. You can use table.isEmpty() or similar logic to confirm if there's data before proceeding.

Leave a like if this helps! Kudos,
Palash