Hello ,
I'm working on a Delta Live Tables (DLT) pipeline where I need to implement a conditional step that only triggers under specific conditions. Here's the challenge I'm facing:
- I have a function that checks if the data meets certain thresholds. If the data passes, the function returns the original DataFrame. If the thresholds are breached, it returns a different DataFrame containing logs or failure statistics.
- The main challenge is that these two DataFrames have different schemas, which is causing difficulties in the pipeline
- I only want to initiate the logs-related steps in DLT when the threshold check fails. Currently, my pipeline writes two outputs regardless: one for the logs and one for the passed data, even if the passed data count is zero. This isn't the behavior I want.
Question:
Is there a way to structure the DLT pipeline so that the logs path graph or process, is only initiated when the threshold check fails? Ideally, l'm looking for a nested or conditional DLT step that only runs when the threshold validation fails.
- The fact that DLT doesn't have built-in flow control mechanism like ETL tools, is challenging.
Any guidance or best practices for achieving this would be greatly appreciated!
Thanks in advance for your help!