09-12-2024 07:54 AM
Hi,
I'm trying to setup CICD pipeline for Delta Live Table jobs using Databricks Bundles. I have a problem with path to notebook in pipeline. According to this example:
https://docs.databricks.com/en/delta-live-tables/tutorial-bundles.html
YAML file should looks like:
But it doesn't work:
Assets bundles are ignoring configuration (bundle.sourcePath).
Fix is pretty simple:
09-12-2024 08:49 AM - edited 09-12-2024 08:52 AM
Hi @Elderion ,
A couple of lines from Databricks AI assistant:
The issue you're encountering with the bundle.sourcePath variable being ignored in your CI/CD pipeline setup for Delta Live Tables (DLT) using Databricks Repos and Bundles seems to stem from a misunderstanding of how asset bundle paths are resolved in the DLT configuration.
In the DLT configuration within a YAML file, the bundle.sourcePath is intended to specify the local directory that contains the assets (e.g., notebooks, libraries) to be included in the bundle when you're running databricks repos bundle create. This means it's used during the bundle creation process to determine what local files should be included in the bundle.
However, when specifying paths in the libraries section of your DLT pipeline configuration, you're actually referencing the path within the Databricks workspace or the bundle itself after it has been uploaded. The ${workspace.file_path} variable is used to reference the path of the file within the workspace, which is why your fix works. It correctly specifies the path to the notebook within the bundle or workspace, rather than relying on the bundle.sourcePath which is not used at runtime for resolving asset paths in the pipeline configuration.
The bundle.sourcePath is not used by the DLT pipeline configuration at runtime to resolve paths to assets; it's only used at bundle creation time. To reference assets within your DLT pipeline configuration, you should use workspace or bundle-relative paths, as you've discovered.
In summary, your fix is the correct approach for specifying the path to the notebook in the DLT pipeline configuration. The bundle.sourcePath is for bundle creation, not for runtime asset path resolution in the pipeline configuration.
To sum up "sourcePath" is what you provide to specify where is the bundle defined locally, "workspace" variables are related to your targets after deployment. You are right that the example on the page is quite confusing and probably the configuration generated by default does not help.
What you can do you can use a relative path to the location where databricks.yml is located, like .../databricks/src/notebooks/dlt_test.
09-12-2024 08:49 AM - edited 09-12-2024 08:52 AM
Hi @Elderion ,
A couple of lines from Databricks AI assistant:
The issue you're encountering with the bundle.sourcePath variable being ignored in your CI/CD pipeline setup for Delta Live Tables (DLT) using Databricks Repos and Bundles seems to stem from a misunderstanding of how asset bundle paths are resolved in the DLT configuration.
In the DLT configuration within a YAML file, the bundle.sourcePath is intended to specify the local directory that contains the assets (e.g., notebooks, libraries) to be included in the bundle when you're running databricks repos bundle create. This means it's used during the bundle creation process to determine what local files should be included in the bundle.
However, when specifying paths in the libraries section of your DLT pipeline configuration, you're actually referencing the path within the Databricks workspace or the bundle itself after it has been uploaded. The ${workspace.file_path} variable is used to reference the path of the file within the workspace, which is why your fix works. It correctly specifies the path to the notebook within the bundle or workspace, rather than relying on the bundle.sourcePath which is not used at runtime for resolving asset paths in the pipeline configuration.
The bundle.sourcePath is not used by the DLT pipeline configuration at runtime to resolve paths to assets; it's only used at bundle creation time. To reference assets within your DLT pipeline configuration, you should use workspace or bundle-relative paths, as you've discovered.
In summary, your fix is the correct approach for specifying the path to the notebook in the DLT pipeline configuration. The bundle.sourcePath is for bundle creation, not for runtime asset path resolution in the pipeline configuration.
To sum up "sourcePath" is what you provide to specify where is the bundle defined locally, "workspace" variables are related to your targets after deployment. You are right that the example on the page is quite confusing and probably the configuration generated by default does not help.
What you can do you can use a relative path to the location where databricks.yml is located, like .../databricks/src/notebooks/dlt_test.
09-13-2024 05:38 AM
Perfect, thanks! Sounds like a good explanaition. 🙂
09-12-2024 06:03 PM
I had this error once.
you need to specify the extension of your file. If you set the notebook to be python, then it must be .py at the end, likewise .sql if you used SQL
Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.
If there isn’t a group near you, start one and help create a community that brings people together.
Request a New Group