cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

Managing streaming checkpoints with unity catalog

Erik
Valued Contributor III

This is partly a question, partly a feature request: How do you guys handle streaming checkpoints in combination with unity catalog managed tables?

It seems like the only way is to create a volume, and manually specify paths in it as streaming checkpoints. Do you use a single volume per catalog? A single volume per schema, or even one volume per table?

And how do you handle cleanup of the streaming checkpoints when you drop a table? You go in as admin and manually delete the streaming checkpoint?

So for the feature request to databricks:

For managed tables in unity catalog it would be great if there was a function where you could provide the catalog.schema.table, and a checkpoint-name, and it would provide a path you could use as a streaming-checkpoint location. If the table gets dropped then this location should also be deleted. Aka, managed tables should have the option of managed checkpoint locations.

3 REPLIES 3

michelle653burk
New Contributor

@Erik wrote:

This is partly a question, partly a feature request: How do you guys handle streaming checkpoints in combination with unity catalog managed tables?

It seems like the only way is to create a volume, and manually specify paths in it as streaming checkpoints. Do you use a single volume per catalog? A single volume per schema, or even one volume per table?

And how do you handle cleanup of the streaming checkpoints when you drop a table? You go in as admin and manually delete the streaming checkpoint?

So for the feature request to databricks:

For managed tables in unity catalog it would be great if there was a function where you could provide the catalog.schema.table, and a checkpoint-name, and it would provide a path you could use as a streaming-checkpoint location. If the table gets dropped then this location should also be deleted. Aka, managed tables should have the option of managed checkpoint locations.


Hello!

It sounds like you're trying to streamline the process of handling streaming checkpoints with Unity Catalog managed tables in Databricks. Currently, it seems like the only way is to manually create volumes and specify paths for streaming checkpoints. This can be cumbersome, especially when managing multiple tables.

For your feature request, it would indeed be helpful if Databricks provided a function where you could input the catalog.schema.table and a checkpoint name, and it would automatically generate a path for the streaming checkpoint. Additionally, having the system handle the cleanup of these checkpoints when a table is dropped would greatly simplify maintenance.


@michelle653burk wrote:

@Erik wrote:

This is partly a question, partly a feature request: How do you guys handle streaming checkpoints in combination with unity catalog managed tables?

It seems like the only way is to create a volume, and manually specify paths in it as streaming checkpoints. Do you use a single volume per catalog? A single volume per schema, or even one volume per table?

And how do you handle cleanup of the streaming checkpoints when you drop a table? You go in as admin and manually delete the streaming checkpoint?

So for the feature request to databricks:

For managed tables in unity catalog it would be great if there was a function where you could provide the catalog.schema.table, and a checkpoint-name, and it would provide a path you could use as a streaming-checkpoint location. If the table gets dropped then this location should also be deleted. Aka, managed tables should have the option of managed checkpoint locations.


Hello!

It sounds like you're trying to streamline the process of handling streaming checkpoints with Unity Catalog managed tables in Databricks. Currently, it seems like the only way is to manually MyACI create volumes and specify paths for streaming checkpoints. This can be cumbersome, especially when managing multiple tables.

For your feature request, it would indeed be helpful if Databricks provided a function where you could input the catalog.schema.table and a checkpoint name, and it would automatically generate a path for the streaming checkpoint. Additionally, having the system handle the cleanup of these checkpoints when a table is dropped would greatly simplify maintenance.


Was that helpful to you?

cgrant
Databricks Employee
Databricks Employee

For Structured Streaming Applications, this would be a nice feature.

Delta Live Tables manages checkpoints for you out of the box - you don't even have to reason about checkpoints at all, would recommend checking it out!

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you wonโ€™t want to miss the chance to attend and share knowledge.

If there isnโ€™t a group near you, start one and help create a community that brings people together.

Request a New Group