cancel
Showing results for 
Search instead for 
Did you mean: 
Technical Blog
Explore in-depth articles, tutorials, and insights on data analytics and machine learning in the Databricks Technical Blog. Stay updated on industry trends, best practices, and advanced techniques.
cancel
Showing results for 
Search instead for 
Did you mean: 
Shu_Li
Databricks Employee
Databricks Employee

Have production Databricks Pipelines on trigger mode and need to change the refresh frequency of some of the Streaming Tables(ST) without a full refresh?

This article walks you through an easy, step-by-step guide of how to do exactly that using Databricks Pipeline Move Feature

Pipeline Move

  • Step 1, Stop the source pipeline if it is running.
  • Step 2, Confirm the source pipeline is stopped, then remove the ST definition from the source pipeline’s notebook or file
  • Step 3, Reassign the ST from the source pipeline to the destination pipeline:

ALTER STREAMING TABLEmy_catalog.my_schema.my_table SET TBLPROPERTIES ("pipelines.pipelineId"="xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx");

  • Step 4, Add the ST’s definition to the destination pipeline’s notebook or file.

This will allow the destination pipeline to update the ST instead of the source pipeline.

Just like that, now you have successfully moved ST from source pipeline line to target pipeline without any full refresh in 4 simple steps!

To use the Databricks pipeline move feature, the following requirements must be met:

  • Both the source and destination pipelines must be Unity Catalog (UC) pipelines.
  • Both pipelines must reside within the same Databricks workspace.
  • The user performing the operation must own both the source and destination pipelines.
  • The destination pipeline must use the default publishing mode, which enables publishing tables to multiple catalogs and schemas.

Tips and Tricks

  • To move temporary ST from source pipeline to destination pipeline, first set temporary=False, then perform Steps 1 to 4, for example: @Dlt.table(name='my_temp_st', temporary=False)
  • To solve the “Table already exist” error during the move, check if you follow the order of the steps 1 to 4: make sure removing ST definition from source pipeline notebook or file BEFORE switching pipeline id to destination pipeline using Alter Streaming Table

Databricks pipeline move feature is also useful in scenarios where you may want to split a large pipeline into smaller ones or merge several pipelines together.

Happy Pipelining!

 

2 Comments