I am working with a continuously running Spark Structured Streaming job in Databricks, deployed as a standalone job using continuous trigger mode via Databricks Asset Bundles (DABs).
On top of the streaming output table (created via writeStream), I want to define a SQL view. However, I am unsure about the best practice for handling this in a CI/CD-friendly way.
The core challenge is that the streaming job is designed to run continuously and therefore never reaches a terminal “success” state. Because of this, it cannot easily be orchestrated within a multi-task job where a downstream notebook task depends on its successful completion to create the view.
I have considered a few possible approaches:
- Pre-defining the table and view in a separate notebook task that the streaming job depends on. This works, but it requires manual schema management, whereas ideally I would like Spark to infer and manage the schema automatically when creating the table via writeStream.
- Creating a separate job/notebook that waits for the table to exist and then creates the view, potentially using retry logic or a polling loop. However, since Databricks jobs do not support a true “run once after deployment” pattern in a clean way, this approach feels fragile.
- Triggering a post-deployment step via the Databricks CLI to run a job that creates the view after deployment. While viable, this would require changes to the existing CI/CD pipeline, which I would prefer to avoid.
What is the recommended or most elegant way to handle this pattern in Databricks when working with continuously running streaming jobs and downstream SQL views in a CI/CD setup using DABs?