@Erik wrote:
This is partly a question, partly a feature request: How do you guys handle streaming checkpoints in combination with unity catalog managed tables?
It seems like the only way is to create a volume, and manually specify paths in it as streaming checkpoints. Do you use a single volume per catalog? A single volume per schema, or even one volume per table?
And how do you handle cleanup of the streaming checkpoints when you drop a table? You go in as admin and manually delete the streaming checkpoint?
So for the feature request to databricks:
For managed tables in unity catalog it would be great if there was a function where you could provide the catalog.schema.table, and a checkpoint-name, and it would provide a path you could use as a streaming-checkpoint location. If the table gets dropped then this location should also be deleted. Aka, managed tables should have the option of managed checkpoint locations.
Hello!
It sounds like you're trying to streamline the process of handling streaming checkpoints with Unity Catalog managed tables in Databricks. Currently, it seems like the only way is to manually create volumes and specify paths for streaming checkpoints. This can be cumbersome, especially when managing multiple tables.
For your feature request, it would indeed be helpful if Databricks provided a function where you could input the catalog.schema.table and a checkpoint name, and it would automatically generate a path for the streaming checkpoint. Additionally, having the system handle the cleanup of these checkpoints when a table is dropped would greatly simplify maintenance.