cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

Best practice for creating SQL views on top of continuously running Spark Structured Streaming jobs

mnissen1337
New Contributor III

I am working with a continuously running Spark Structured Streaming job in Databricks, deployed as a standalone job using continuous trigger mode via Databricks Asset Bundles (DABs).

On top of the streaming output table (created via writeStream), I want to define a SQL view. However, I am unsure about the best practice for handling this in a CI/CD-friendly way.

The core challenge is that the streaming job is designed to run continuously and therefore never reaches a terminal โ€œsuccessโ€ state. Because of this, it cannot easily be orchestrated within a multi-task job where a downstream notebook task depends on its successful completion to create the view.

I have considered a few possible approaches:

  • Pre-defining the table and view in a separate notebook task that the streaming job depends on. This works, but it requires manual schema management, whereas ideally I would like Spark to infer and manage the schema automatically when creating the table via writeStream.
  • Creating a separate job/notebook that waits for the table to exist and then creates the view, potentially using retry logic or a polling loop. However, since Databricks jobs do not support a true โ€œrun once after deploymentโ€ pattern in a clean way, this approach feels fragile.
  • Triggering a post-deployment step via the Databricks CLI to run a job that creates the view after deployment. While viable, this would require changes to the existing CI/CD pipeline, which I would prefer to avoid.

What is the recommended or most elegant way to handle this pattern in Databricks when working with continuously running streaming jobs and downstream SQL views in a CI/CD setup using DABs?

1 REPLY 1

amirabedhiafi
Contributor

Hello @mnissen1337  !

Have you though about decoupling view DDL from the continuously running streaming job ?

Do not make the view creation a downstream task of the stream because continuous jobs are not meant to reach success and in DBKS  continuous trigger jobs cannot use normal task dependencies.

Instead you can manage the view as deployment time, idempotent DDL. In other words you can create the target delta table explicitly or run a small bootstrap job once to create it, then deploy CREATE OR REPLACE VIEW ... separately before starting or restarting the stream. In prod, I would avoid relying on streaming schema inference as the contract for a downstream view and define the table schema deliberately and allow controlled schema evolution only when needed.

If you want a more declarative DBKS native model, think about lakeflow declarative pipelines or streaming tables and materialized views where streaming tables and derived SQL objects are managed together as pipeline assets. (

If this answer resolves your question, could you please mark it as โ€œAccept as Solutionโ€? It will help other users quickly find the correct fix.

Senior BI/Data Engineer | Microsoft MVP Data Platform | Microsoft MVP Power BI | Power BI Super User | C# Corner MVP