cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

DLT pipeline observability questions (and maybe suggestions)

guangyi
Contributor III

All my questions is around this code block

@Dlt.append_flow(target=”target_table”):
def flow_01():
  df = spark.readStream.table(“table_01”)

@dlt.append_flow(target=”target_table”):
def flow_02():
  df = spark.readStream.table(“table_02”)

The first question is can I manually check or update or delete the checkpoint of stream table reading in the above code?

I suppose there is no way of specifing the checkpoint location because I cannot find any document about this feature. I try to find resource under the Structured Streaming checkpoints or Configure a Delta Live Tables pipeline but nothing there.

Why I want to do this is because I want to do some troubleshooting. For example, I want to monitor the daily data reading from a specify streaming table, like how much data has been read this time, start from where ended at where. Also if something wrong goes wrong, I can delete the checkpoint as a reset measurement. There is a full refresh can solve this problem but if I can access to the checkpoint it may give me lots insight about the pipeline is running.

The second question is similar, is there a way to monitor the append_flow behavior, like how much data flow the source table to the target table on the daily job?

The reason I want this is because, after all the append flow process accomplished what I got from the target table is only a total incremental number for this time. I cannot differentiate each individual flow contributing how much data or how much time they cost individually

Is there any feature I ignored or I can referenced to accomplish my goal? Or do we have any plan to achieve these feature in the future?

 

0 REPLIES 0

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group