cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Asset Bundles - path is not contained in bundle root path

kamilmuszynski
New Contributor

I'm trying to adopt a code base to use asset bundles. I was trying to come up with folder structure that will work for our bundles and came up with layout as below:


common/ (source code)
services/ (source code)
dist/ (here artifacts from monorepo are built; I can't change this)
db-asset-bundles/
data-pipeline/
integration/
databricks.yaml
production/
databricks.yaml
resources/
variables.yaml
artifacts.yaml

I'd like for integration and production bundles to share some common configuration. I've discovered that I can include '../resources/variables.yaml' from both integration/databricks.yaml and production/databricks.yaml, but including artifacts result in:

Error: path (...redacted...)/db-asset-bundles/data-pipeline/resources is not contained in bundle root path

Are there any rules of what can be included from databricks.yaml - does it have to be a folder on the same level, or below the file?

The same problem happens when I try to include wheels built into /dist directory in the root of monorepo - I can't reference to them from databricks.yaml as it would require a path like '../../../dist/[wheel-name]', and that results in the same error about wheel not being contained in bundle root. So far I've worked around this by defining artifact in production/databricks.yaml as:

artifacts:
pipeline-wheel:
type: whl
build: "pants package <path to wheel definition inside services> && mkdir dist && cp ../../../dist/<wheel file> dist/<wheel file>"
# we use pantsbuild.org buildsystem for python that manages wheel packaging, but all artifacts end up in /dist dir at root level...

Are there any ways around this that I'm missing?

Thanks a lot!

1 REPLY 1

Kaniz_Fatma
Community Manager
Community Manager

Hi @kamilmuszynskiWhen working with Databricks Asset Bundles, there are specific rules and guidelines for structuring your configuration files.

Let’s break down the key points to address your concerns:

  1. Bundle Configuration File (databricks.yml):

    • Each bundle must contain at least one bundle configuration file named databricks.yml.
    • This file is expressed in YAML format and defines the bundle’s settings.
    • The top-level bundle mapping includes essential information such as the bundle name and compute ID.
    • You can also specify custom variables, Git settings, and workspace settings.
    • The artifacts section allows you to define artifacts for the bundle.
    • The include section lets you reference additional configuration files.
    • For more details, refer to the official documentation.
  2. Artifact Paths and Inclusion:

    • When including files from other directories, ensure that the paths are relative to the bundle root.

    • The error you encountered (“path not contained in bundle root”) indicates that the referenced path is outside the bundle’s scope.

    • To include artifacts from the /dist directory, consider the following approaches:

      a. Relative Paths: - If possible, structure your bundle so that the artifacts are within the bundle’s directory tree. - For example, place the wheels directly inside the bundle folder or a subdirectory. - Then, reference them using relative paths like ./dist/<wheel-name>.

      b. Custom Build Steps: - Define a custom build step in your databricks.yml that copies the necessary artifacts during bundle creation. - Your existing workaround with pants package and cp can be adapted for this purpose. - Ensure that the build step correctly places the wheels in the expected location within the bundle.

  3. Example Configuration:

    • Here’s a simplified example of how you might structure your databricks.yml:

      bundle:
        name: my-bundle
        compute_id: my-compute
        # Other settings...
      
      artifacts:
        pipeline-wheel:
          type: whl
          build: "pants package <path-to-wheel-definition> && mkdir dist && cp <wheel-file> dist/<wheel-file>"
      
  4. Databricks CLI and Workflow:

    • Use the Databricks CLI to validate, deploy, and run bundles.
    • The commands databricks bundle validate, databricks bundle deploy, and databricks bundle run are essential for managing bundles.
    • For more information, explore the Databricks Asset Bundles development workflow.

Remember that Databricks Asset Bundles provide a way to package and deploy assets consistently across workspaces. By adhering to the guidelines, you can create efficient and manageable bundles for your use case. 🚀

 
Join 100K+ Data Experts: Register Now & Grow with Us!

Excited to expand your horizons with us? Click here to Register and begin your journey to success!

Already a member? Login and join your local regional user group! If there isn’t one near you, fill out this form and we’ll create one for you to join!