โ11-26-2024 04:14 AM
Hi!
I am creating an Asset Bundle, which also includes my streaming Delta Live Table Pipelines. I want to move these DLT pipelines to the Asset Bundle, without having to run my DLT streaming Pipeline on all historical files (this takes a lot of compute and time). Is there a way to migrate an existing DLT Pipeline to Asset Bundles?
โ11-26-2024 06:26 AM - edited โ11-26-2024 06:31 AM
So maybe try to use bind command? This command allows to link bundle-defined jobs and pipelines to existing jobs and pipelines in the Databricks workspace so that they become managed by Databricks Asset Bundles
https://docs.gcp.databricks.com/en/dev-tools/cli/bundle-commands.html#bind-bundle-resourcesbundle co...
databricks bundle deployment bind [resource-key] [resource-id]
โ11-26-2024 06:12 AM
Yes, you can migrate an existing Delta Live Table (DLT) pipeline to an Asset Bundle without having to reprocess all historical files. Here are the steps to achieve this:
Create a Databricks Asset Bundle: Use the Databricks CLI to initialize a new bundle. This will create a databricks.yml
file in the root of your project, which will be used to define your Databricks resources, including your DLT pipelines.
Define the DLT Pipeline in the Bundle: In the databricks.yml
file, you will need to define your DLT pipeline. This involves specifying the pipeline's configuration, such as the path to the notebook or script that defines the pipeline logic.
Deploy the Bundle: Use the Databricks CLI to deploy the bundle to your target environment. This will create the necessary resources in your Databricks workspace based on the definitions in the databricks.yml
file.
Run the Pipeline: Once the bundle is deployed, you can run the DLT pipeline from the Databricks extension panel or using the CLI. This will start the pipeline without reprocessing all historical files, as the pipeline will continue from its last processed state.
โ11-26-2024 06:19 AM
I have done these steps, but my DLT still took a long time to process. However, the path to the notebook with my pipeline logic has changed because I am deploying it as a bundle. Is this a problem? Also, the name of the pipeline changed because of a prefix I added in the Asset Bundle.
โ11-26-2024 06:16 AM - edited โ11-26-2024 06:17 AM
Hi @Isa1 ,
f you have existing pipelines that were created using the Databricks user interface or API that you want to move to bundles, you must define them in a bundleโs configuration files. Databricks recommends that you first create a bundle using the steps below and then validate whether the bundle works. You can then add additional definitions, notebooks, and other sources to the bundle.
You can follow official documentation entry. Just repeat steps:
Develop Delta Live Tables pipelines with Databricks Asset Bundles | Databricks on AWS
โ11-26-2024 06:23 AM
When you change the path to the notebook or the name of the pipeline in your Delta Live Table (DLT) pipeline, it can indeed cause issues. Specifically, changing the path to the notebook or the name of the pipeline can lead to the recreation of the pipeline.
โ11-26-2024 06:26 AM - edited โ11-26-2024 06:31 AM
So maybe try to use bind command? This command allows to link bundle-defined jobs and pipelines to existing jobs and pipelines in the Databricks workspace so that they become managed by Databricks Asset Bundles
https://docs.gcp.databricks.com/en/dev-tools/cli/bundle-commands.html#bind-bundle-resourcesbundle co...
databricks bundle deployment bind [resource-key] [resource-id]
โ11-26-2024 06:38 AM
And to add one thing, in Delta Live Tables checkpoints are stored under the storage location specified in the DLT settings. Each table gets a dedicated directory under storage_location/checkpoints/<dlt_table_name. So if you would like to avoid running your pipeline from the start you need to use bind command, because otherwise new pipeline name will create new checkpoint directory.
Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you wonโt want to miss the chance to attend and share knowledge.
If there isnโt a group near you, start one and help create a community that brings people together.
Request a New Group