04-30-2025 01:25 AM
Hi All,
I'm currently using DLT append flow to merge multiple streaming flows into one output.
While trying to make the append flow into a dynamic function for scalability, the dlt append flow seem to have some errors.
stat_table = f"{catalog}.{bronze_schema}.output"
04-30-2025 11:20 AM
@Dlt.append_flow
for each source, adapting your logic for non-Kafka scenarios..withWatermark(event_time_column, "time_limit")
to your streaming DataFrame of each source to manage late-arrival data. Watermarking is critical for streaming aggregations.@append_flow
Definitions:
@append_flow
-decorated functions for each source, ensuring they return a DataFrame compatible with append mode.@append_flow
:
@append_flow
writes to distinct portions of data in the target table, or refactor the pipeline to combine streams upstream if possible.``
Here are the key highlights of this approach:
- A
withWatermark is applied to every source, using an appropriate column (e.g.,
event_time_column) and a time limit like "10 minutes" to ensure support for append mode.
- Each flow has a unique name (e.g.,
"{source}_flow"`), ensuring that checkpoints are distinct and consistent with the flow definition.@append_flow
without conflict. - If your sources don’t have event_time
columns, consider generating a pseudo-event time column for them based on ingestion timestamps.@append_flow
processing. - Examples of handling multiple flows to a single target are also described in other contexts.05-07-2025 08:46 PM
Hi Big Roux,
Thank you for your explanation and sample code.
However, I did tested with your code but I still having the same error:
[STREAMING_OUTPUT_MODE.UNSUPPORTED_OPERATION] Invalid streaming output mode: append. This output mode is not supported for streaming aggregations without watermark on streaming DataFrames/DataSets. SQLSTATE: 42KDE
I have applied the watermarking when reading the source view table.
Is there any other criteria need to apply to the f"{source}" table so that the append_flow would work?
Thank you.
05-07-2025 09:00 PM
Hi Big Roux,
Following up on my previous response, I think it might help if I describe the situation more clearly.
The pipeline starts with read_stream from a Delta table and create a view without applying watermarking.
The next stage involves using create_streaming_table and the apply_flow. I'm trying to read from different source and create separate views, but I need to aggregate the result after doing some transformation on these sources.
Delta_tables -> Views -> Streaming_table(append_flow)
In this case, do I need to apply the water marking in the first step when I try to read stream from my actual source(the Delta table)?
I hope this could give you a clearer picture on how my current pipeline works.
Thank you.
Regards,
Dejian
Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!
Sign Up Now