cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
cancel
Showing results for 
Search instead for 
Did you mean: 

Asset Bundles - path is not contained in bundle root path

kamilmuszynski
New Contributor

I'm trying to adopt a code base to use asset bundles. I was trying to come up with folder structure that will work for our bundles and came up with layout as below:


common/ (source code)
services/ (source code)
dist/ (here artifacts from monorepo are built; I can't change this)
db-asset-bundles/
data-pipeline/
integration/
databricks.yaml
production/
databricks.yaml
resources/
variables.yaml
artifacts.yaml

I'd like for integration and production bundles to share some common configuration. I've discovered that I can include '../resources/variables.yaml' from both integration/databricks.yaml and production/databricks.yaml, but including artifacts result in:

Error: path (...redacted...)/db-asset-bundles/data-pipeline/resources is not contained in bundle root path

Are there any rules of what can be included from databricks.yaml - does it have to be a folder on the same level, or below the file?

The same problem happens when I try to include wheels built into /dist directory in the root of monorepo - I can't reference to them from databricks.yaml as it would require a path like '../../../dist/[wheel-name]', and that results in the same error about wheel not being contained in bundle root. So far I've worked around this by defining artifact in production/databricks.yaml as:

artifacts:
pipeline-wheel:
type: whl
build: "pants package <path to wheel definition inside services> && mkdir dist && cp ../../../dist/<wheel file> dist/<wheel file>"
# we use pantsbuild.org buildsystem for python that manages wheel packaging, but all artifacts end up in /dist dir at root level...

Are there any ways around this that I'm missing?

Thanks a lot!

1 REPLY 1

Kaniz
Community Manager
Community Manager

Hi @kamilmuszynskiWhen working with Databricks Asset Bundles, there are specific rules and guidelines for structuring your configuration files.

Let’s break down the key points to address your concerns:

  1. Bundle Configuration File (databricks.yml):

    • Each bundle must contain at least one bundle configuration file named databricks.yml.
    • This file is expressed in YAML format and defines the bundle’s settings.
    • The top-level bundle mapping includes essential information such as the bundle name and compute ID.
    • You can also specify custom variables, Git settings, and workspace settings.
    • The artifacts section allows you to define artifacts for the bundle.
    • The include section lets you reference additional configuration files.
    • For more details, refer to the official documentation.
  2. Artifact Paths and Inclusion:

    • When including files from other directories, ensure that the paths are relative to the bundle root.

    • The error you encountered (“path not contained in bundle root”) indicates that the referenced path is outside the bundle’s scope.

    • To include artifacts from the /dist directory, consider the following approaches:

      a. Relative Paths: - If possible, structure your bundle so that the artifacts are within the bundle’s directory tree. - For example, place the wheels directly inside the bundle folder or a subdirectory. - Then, reference them using relative paths like ./dist/<wheel-name>.

      b. Custom Build Steps: - Define a custom build step in your databricks.yml that copies the necessary artifacts during bundle creation. - Your existing workaround with pants package and cp can be adapted for this purpose. - Ensure that the build step correctly places the wheels in the expected location within the bundle.

  3. Example Configuration:

    • Here’s a simplified example of how you might structure your databricks.yml:

      bundle:
        name: my-bundle
        compute_id: my-compute
        # Other settings...
      
      artifacts:
        pipeline-wheel:
          type: whl
          build: "pants package <path-to-wheel-definition> && mkdir dist && cp <wheel-file> dist/<wheel-file>"
      
  4. Databricks CLI and Workflow:

    • Use the Databricks CLI to validate, deploy, and run bundles.
    • The commands databricks bundle validate, databricks bundle deploy, and databricks bundle run are essential for managing bundles.
    • For more information, explore the Databricks Asset Bundles development workflow.

Remember that Databricks Asset Bundles provide a way to package and deploy assets consistently across workspaces. By adhering to the guidelines, you can create efficient and manageable bundles for your use case. 🚀

 
Welcome to Databricks Community: Lets learn, network and celebrate together

Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections. 

Click here to register and join today! 

Engage in exciting technical discussions, join a group with your peers and meet our Featured Members.