cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Delta Live Tables + Databricks Assets Bundles

Elderion
New Contributor II

Hi,

I'm trying to setup CICD pipeline for Delta Live Table jobs using Databricks Bundles. I have a problem with path to notebook in pipeline. According to this example:

https://docs.databricks.com/en/delta-live-tables/tutorial-bundles.html

YAML file should looks like:

resources:
  pipelines:
    pipeline_dlt_test:
      name: dlt_test_${bundle.target}
      clusters:
        - label: default
          autoscale:
            min_workers: 1
            max_workers: 5
            mode: ENHANCED
      libraries:
        - notebook:
            path:  /notebooks/dlt_test
      configuration:
        bundle.sourcePath: ${workspace.root_path}/databricks/src/notebooks
      development: true
      catalog: ${bundle.target}_bronze

But it doesn't work:

Elderion_0-1726152706169.png

Assets bundles are ignoring configuration (bundle.sourcePath).

Fix is pretty simple:

resources:
  pipelines:
    pipeline_dlt_test:
      name: dlt_test_${bundle.target}
      clusters:
        - label: default
          autoscale:
            min_workers: 1
            max_workers: 5
            mode: ENHANCED
      libraries:
        - notebook:
            path:  ${workspace.file_path}/databricks/src/notebooks/dlt_test
            #path:  /dlt_test
      #configuration:
      #  bundle.sourcePath: ${workspace.root_path}/databricks/src/notebooks
      development: true
      catalog: ${bundle.target}_bronze
 
This sets correct path to the notebook. I'm confused a little it.
I have databricks.yml in root path.
 
Why bundle.sourcePath variable is ignored?

 

 

 

1 ACCEPTED SOLUTION

Accepted Solutions

filipniziol
Contributor

Hi @Elderion ,

A couple of lines from Databricks AI assistant:

The issue you're encountering with the bundle.sourcePath variable being ignored in your CI/CD pipeline setup for Delta Live Tables (DLT) using Databricks Repos and Bundles seems to stem from a misunderstanding of how asset bundle paths are resolved in the DLT configuration.

In the DLT configuration within a YAML file, the bundle.sourcePath is intended to specify the local directory that contains the assets (e.g., notebooks, libraries) to be included in the bundle when you're running databricks repos bundle create. This means it's used during the bundle creation process to determine what local files should be included in the bundle.

However, when specifying paths in the libraries section of your DLT pipeline configuration, you're actually referencing the path within the Databricks workspace or the bundle itself after it has been uploaded. The ${workspace.file_path} variable is used to reference the path of the file within the workspace, which is why your fix works. It correctly specifies the path to the notebook within the bundle or workspace, rather than relying on the bundle.sourcePath which is not used at runtime for resolving asset paths in the pipeline configuration.

The bundle.sourcePath is not used by the DLT pipeline configuration at runtime to resolve paths to assets; it's only used at bundle creation time. To reference assets within your DLT pipeline configuration, you should use workspace or bundle-relative paths, as you've discovered.

In summary, your fix is the correct approach for specifying the path to the notebook in the DLT pipeline configuration. The bundle.sourcePath is for bundle creation, not for runtime asset path resolution in the pipeline configuration.

To sum up "sourcePath" is what you provide to specify where is the bundle defined locally, "workspace" variables are related to your targets after deployment. You are right that the example on the page is quite confusing and probably the configuration generated by default does not help. 

 

What you can do you can use a relative path to the location where databricks.yml is located, like .../databricks/src/notebooks/dlt_test.

 

 

View solution in original post

3 REPLIES 3

filipniziol
Contributor

Hi @Elderion ,

A couple of lines from Databricks AI assistant:

The issue you're encountering with the bundle.sourcePath variable being ignored in your CI/CD pipeline setup for Delta Live Tables (DLT) using Databricks Repos and Bundles seems to stem from a misunderstanding of how asset bundle paths are resolved in the DLT configuration.

In the DLT configuration within a YAML file, the bundle.sourcePath is intended to specify the local directory that contains the assets (e.g., notebooks, libraries) to be included in the bundle when you're running databricks repos bundle create. This means it's used during the bundle creation process to determine what local files should be included in the bundle.

However, when specifying paths in the libraries section of your DLT pipeline configuration, you're actually referencing the path within the Databricks workspace or the bundle itself after it has been uploaded. The ${workspace.file_path} variable is used to reference the path of the file within the workspace, which is why your fix works. It correctly specifies the path to the notebook within the bundle or workspace, rather than relying on the bundle.sourcePath which is not used at runtime for resolving asset paths in the pipeline configuration.

The bundle.sourcePath is not used by the DLT pipeline configuration at runtime to resolve paths to assets; it's only used at bundle creation time. To reference assets within your DLT pipeline configuration, you should use workspace or bundle-relative paths, as you've discovered.

In summary, your fix is the correct approach for specifying the path to the notebook in the DLT pipeline configuration. The bundle.sourcePath is for bundle creation, not for runtime asset path resolution in the pipeline configuration.

To sum up "sourcePath" is what you provide to specify where is the bundle defined locally, "workspace" variables are related to your targets after deployment. You are right that the example on the page is quite confusing and probably the configuration generated by default does not help. 

 

What you can do you can use a relative path to the location where databricks.yml is located, like .../databricks/src/notebooks/dlt_test.

 

 

Perfect, thanks! Sounds like a good explanaition. 🙂 

ThierryBa
New Contributor III

I had this error once.

you need to specify the extension of your file. If you set the notebook to be python, then it must be .py at the end, likewise .sql if you used SQL

 

    libraries:
        - notebook:
            path:  ${workspace.file_path}/databricks/src/notebooks/dlt_test.py 
 
OR
 
    libraries:
        - notebook:
            path:  ${workspace.file_path}/databricks/src/notebooks/dlt_test.sql
 
 
Data and Analytics Practice Lead

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group