cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

Moving existing Delta Live Table to Asset Bundle

Isa1
New Contributor

Hi!

I am creating an Asset Bundle, which also includes my streaming Delta Live Table Pipelines. I want to move these DLT pipelines to the Asset Bundle, without having to run my DLT streaming Pipeline on all historical files (this takes a lot of compute and time). Is there a way to migrate an existing DLT Pipeline to Asset Bundles? 

1 ACCEPTED SOLUTION

Accepted Solutions

So maybe try to use bind command? This command allows to link bundle-defined jobs and pipelines to existing jobs and pipelines in the Databricks workspace so that they become managed by Databricks Asset Bundles

https://docs.gcp.databricks.com/en/dev-tools/cli/bundle-commands.html#bind-bundle-resourcesbundle co...

databricks bundle deployment bind [resource-key] [resource-id]


group | Databricks on Google Cloud

View solution in original post

6 REPLIES 6

Walter_C
Databricks Employee
Databricks Employee

Yes, you can migrate an existing Delta Live Table (DLT) pipeline to an Asset Bundle without having to reprocess all historical files. Here are the steps to achieve this:

  1. Create a Databricks Asset Bundle: Use the Databricks CLI to initialize a new bundle. This will create a databricks.yml file in the root of your project, which will be used to define your Databricks resources, including your DLT pipelines.

  2. Define the DLT Pipeline in the Bundle: In the databricks.yml file, you will need to define your DLT pipeline. This involves specifying the pipeline's configuration, such as the path to the notebook or script that defines the pipeline logic.

  3. Deploy the Bundle: Use the Databricks CLI to deploy the bundle to your target environment. This will create the necessary resources in your Databricks workspace based on the definitions in the databricks.yml file.

  4. Run the Pipeline: Once the bundle is deployed, you can run the DLT pipeline from the Databricks extension panel or using the CLI. This will start the pipeline without reprocessing all historical files, as the pipeline will continue from its last processed state.

Isa1
New Contributor

I have done these steps, but my DLT still took a long time to process. However, the path to the notebook with my pipeline logic has changed because I am deploying it as a bundle. Is this a problem? Also, the name of the pipeline changed because of a prefix I added in the Asset Bundle. 

szymon_dybczak
Contributor III

Hi @Isa1 ,

f you have existing pipelines that were created using the Databricks user interface or API that you want to move to bundles, you must define them in a bundleโ€™s configuration files. Databricks recommends that you first create a bundle using the steps below and then validate whether the bundle works. You can then add additional definitions, notebooks, and other sources to the bundle.

You can follow official documentation entry. Just repeat steps:

Develop Delta Live Tables pipelines with Databricks Asset Bundles | Databricks on AWS

Walter_C
Databricks Employee
Databricks Employee

When you change the path to the notebook or the name of the pipeline in your Delta Live Table (DLT) pipeline, it can indeed cause issues. Specifically, changing the path to the notebook or the name of the pipeline can lead to the recreation of the pipeline.

So maybe try to use bind command? This command allows to link bundle-defined jobs and pipelines to existing jobs and pipelines in the Databricks workspace so that they become managed by Databricks Asset Bundles

https://docs.gcp.databricks.com/en/dev-tools/cli/bundle-commands.html#bind-bundle-resourcesbundle co...

databricks bundle deployment bind [resource-key] [resource-id]


group | Databricks on Google Cloud

And to add one thing, in Delta Live Tables checkpoints are stored under the storage location specified in the DLT settings. Each table gets a dedicated directory under storage_location/checkpoints/<dlt_table_name. So if you would like to avoid running your pipeline from the start you need to use bind command, because otherwise new pipeline name will create new checkpoint directory.


Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you wonโ€™t want to miss the chance to attend and share knowledge.

If there isnโ€™t a group near you, start one and help create a community that brings people together.

Request a New Group