cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Best practice for creating SQL views on top of continuously running Spark Structured Streaming jobs

mnissen1337
Visitor

I am working with a continuously running Spark Structured Streaming job in Databricks, deployed as a standalone job using continuous trigger mode via Databricks Asset Bundles (DABs).

On top of the streaming output table (created via writeStream), I want to define a SQL view. However, I am unsure about the best practice for handling this in a CI/CD-friendly way.

The core challenge is that the streaming job is designed to run continuously and therefore never reaches a terminal “success” state. Because of this, it cannot easily be orchestrated within a multi-task job where a downstream notebook task depends on its successful completion to create the view.

I have considered a few possible approaches:

  • Pre-defining the table and view in a separate notebook task that the streaming job depends on. This works, but it requires manual schema management, whereas ideally I would like Spark to infer and manage the schema automatically when creating the table via writeStream.
  • Creating a separate job/notebook that waits for the table to exist and then creates the view, potentially using retry logic or a polling loop. However, since Databricks jobs do not support a true “run once after deployment” pattern in a clean way, this approach feels fragile.
  • Triggering a post-deployment step via the Databricks CLI to run a job that creates the view after deployment. While viable, this would require changes to the existing CI/CD pipeline, which I would prefer to avoid.

What is the recommended or most elegant way to handle this pattern in Databricks when working with continuously running streaming jobs and downstream SQL views in a CI/CD setup using DABs?

0 REPLIES 0