cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

DLT pipeline observability questions (and maybe suggestions)

guangyi
Contributor III

All my questions is around this code block

@Dlt.append_flow(target=”target_table”):
def flow_01():
  df = spark.readStream.table(“table_01”)

@dlt.append_flow(target=”target_table”):
def flow_02():
  df = spark.readStream.table(“table_02”)

The first question is can I manually check or update or delete the checkpoint of stream table reading in the above code?

I suppose there is no way of specifing the checkpoint location because I cannot find any document about this feature. I try to find resource under the Structured Streaming checkpoints or Configure a Delta Live Tables pipeline but nothing there.

Why I want to do this is because I want to do some troubleshooting. For example, I want to monitor the daily data reading from a specify streaming table, like how much data has been read this time, start from where ended at where. Also if something wrong goes wrong, I can delete the checkpoint as a reset measurement. There is a full refresh can solve this problem but if I can access to the checkpoint it may give me lots insight about the pipeline is running.

The second question is similar, is there a way to monitor the append_flow behavior, like how much data flow the source table to the target table on the daily job?

The reason I want this is because, after all the append flow process accomplished what I got from the target table is only a total incremental number for this time. I cannot differentiate each individual flow contributing how much data or how much time they cost individually

Is there any feature I ignored or I can referenced to accomplish my goal? Or do we have any plan to achieve these feature in the future?

 

2 REPLIES 2

Nam_Nguyen
Databricks Employee
Databricks Employee

Hi @guangyi , I'll be looking into this, and I'll get back to you with an answer

Nam_Nguyen
Databricks Employee
Databricks Employee

Hello @guangyi , I am getting back to you with some insights

  • Regarding your first question about checkpointing
    • You can manually check the checkpointing location of your stream table. The checkpoints of your Delta Live Tables are under Storage location in Destination of Pipeline settings. Each table gets a dedicated directory under <storage_location>/checkpoints/<dlt_table_name>
    • As for update or delete the checkpointing, it's not technically feasible AFAIK, and it's not something that we would recommend either. It could create some unexpected behavior of your stream pipelines, and it'll be difficult to troubleshoot this type of error.
    • To closely monitor your streaming pipelines, there are some useful information that you can find in the event log of Delta Live Tables (some example here https://docs.databricks.com/en/delta-live-tables/observability.html). For example, you can query the event log (it can either be in Hive metastore or Unity Catalog) and find the timestamp where each events is processed. In addition, you can create your custom monitoring rules via event hooks https://docs.databricks.com/en/delta-live-tables/event-hooks.html
  • For your second question on the append flow, there isn't any out-of-the-box solution to differentiate the contribution of different flows, but I encourage you to look into the event log of the target table. You can check the event log of your table in Unity Catalog by querying 
    SELECT * FROM event_log(TABLE(my_catalog.my_schema.table1))

I hope that my answers are clear and can address some of your points. If you want to discuss further, please kindly let me know!

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group