Databricks Community

Isa1 · ‎11-26-2024

Hi!

I am creating an Asset Bundle, which also includes my streaming Delta Live Table Pipelines. I want to move these DLT pipelines to the Asset Bundle, without having to run my DLT streaming Pipeline on all historical files (this takes a lot of compute and time). Is there a way to migrate an existing DLT Pipeline to Asset Bundles?

szymon_dybczak · ‎11-26-2024

So maybe try to use bind command? This command allows to link bundle-defined jobs and pipelines to existing jobs and pipelines in the Databricks workspace so that they become managed by Databricks Asset Bundles

https://docs.gcp.databricks.com/en/dev-tools/cli/bundle-commands.html#bind-bundle-resourcesbundle co...

databricks bundle deployment bind [resource-key] [resource-id]

group | Databricks on Google Cloud

View solution in original post

Walter_C · ‎11-26-2024

Yes, you can migrate an existing Delta Live Table (DLT) pipeline to an Asset Bundle without having to reprocess all historical files. Here are the steps to achieve this:

Create a Databricks Asset Bundle: Use the Databricks CLI to initialize a new bundle. This will create a databricks.yml file in the root of your project, which will be used to define your Databricks resources, including your DLT pipelines.
Define the DLT Pipeline in the Bundle: In the databricks.yml file, you will need to define your DLT pipeline. This involves specifying the pipeline's configuration, such as the path to the notebook or script that defines the pipeline logic.
Deploy the Bundle: Use the Databricks CLI to deploy the bundle to your target environment. This will create the necessary resources in your Databricks workspace based on the definitions in the databricks.yml file.
Run the Pipeline: Once the bundle is deployed, you can run the DLT pipeline from the Databricks extension panel or using the CLI. This will start the pipeline without reprocessing all historical files, as the pipeline will continue from its last processed state.

Isa1 · ‎11-26-2024

I have done these steps, but my DLT still took a long time to process. However, the path to the notebook with my pipeline logic has changed because I am deploying it as a bundle. Is this a problem? Also, the name of the pipeline changed because of a prefix I added in the Asset Bundle.

szymon_dybczak · ‎11-26-2024

Hi @Isa1 ,

f you have existing pipelines that were created using the Databricks user interface or API that you want to move to bundles, you must define them in a bundle’s configuration files. Databricks recommends that you first create a bundle using the steps below and then validate whether the bundle works. You can then add additional definitions, notebooks, and other sources to the bundle.

You can follow official documentation entry. Just repeat steps:

Develop Delta Live Tables pipelines with Databricks Asset Bundles | Databricks on AWS

Walter_C · ‎11-26-2024

When you change the path to the notebook or the name of the pipeline in your Delta Live Table (DLT) pipeline, it can indeed cause issues. Specifically, changing the path to the notebook or the name of the pipeline can lead to the recreation of the pipeline.

szymon_dybczak · ‎11-26-2024

So maybe try to use bind command? This command allows to link bundle-defined jobs and pipelines to existing jobs and pipelines in the Databricks workspace so that they become managed by Databricks Asset Bundles

https://docs.gcp.databricks.com/en/dev-tools/cli/bundle-commands.html#bind-bundle-resourcesbundle co...

databricks bundle deployment bind [resource-key] [resource-id]

group | Databricks on Google Cloud

szymon_dybczak · ‎11-26-2024

And to add one thing, in Delta Live Tables checkpoints are stored under the storage location specified in the DLT settings. Each table gets a dedicated directory under storage_location/checkpoints/<dlt_table_name. So if you would like to avoid running your pipeline from the start you need to use bind command, because otherwise new pipeline name will create new checkpoint directory.

Databricks Community

Moving existing Delta Live Table to Asset Bundle

Connect with Databricks Users in Your Area

Databricks Named a Leader in the 2024 Gartner® Magic Quadrant™ for Cloud Database Management Systems

Announcing the new Meta Llama 3.3 model on Databricks

Milestone: DatabricksTV Reaches 100 Videos!

Dotmatics and Databricks Partner to Advance Scientific Intelligence in Life Sciences

Databricks Community Champion - December 2024 - Sujesh Menon