All my questions is around this code block
@Dlt.append_flow(target=”target_table”):
def flow_01():
df = spark.readStream.table(“table_01”)
@dlt.append_flow(target=”target_table”):
def flow_02():
df = spark.readStream.table(“table_02”)
The first question is can I manually check or update or delete the checkpoint of stream table reading in the above code?
I suppose there is no way of specifing the checkpoint location because I cannot find any document about this feature. I try to find resource under the Structured Streaming checkpoints or Configure a Delta Live Tables pipeline but nothing there.
Why I want to do this is because I want to do some troubleshooting. For example, I want to monitor the daily data reading from a specify streaming table, like how much data has been read this time, start from where ended at where. Also if something wrong goes wrong, I can delete the checkpoint as a reset measurement. There is a full refresh can solve this problem but if I can access to the checkpoint it may give me lots insight about the pipeline is running.
The second question is similar, is there a way to monitor the append_flow behavior, like how much data flow the source table to the target table on the daily job?
The reason I want this is because, after all the append flow process accomplished what I got from the target table is only a total incremental number for this time. I cannot differentiate each individual flow contributing how much data or how much time they cost individually
Is there any feature I ignored or I can referenced to accomplish my goal? Or do we have any plan to achieve these feature in the future?