cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Asset Bundles - path is not contained in bundle root path

kamilmuszynski
New Contributor II

I'm trying to adopt a code base to use asset bundles. I was trying to come up with folder structure that will work for our bundles and came up with layout as below:


common/ (source code)
services/ (source code)
dist/ (here artifacts from monorepo are built; I can't change this)
db-asset-bundles/
data-pipeline/
integration/
databricks.yaml
production/
databricks.yaml
resources/
variables.yaml
artifacts.yaml

I'd like for integration and production bundles to share some common configuration. I've discovered that I can include '../resources/variables.yaml' from both integration/databricks.yaml and production/databricks.yaml, but including artifacts result in:

Error: path (...redacted...)/db-asset-bundles/data-pipeline/resources is not contained in bundle root path

Are there any rules of what can be included from databricks.yaml - does it have to be a folder on the same level, or below the file?

The same problem happens when I try to include wheels built into /dist directory in the root of monorepo - I can't reference to them from databricks.yaml as it would require a path like '../../../dist/[wheel-name]', and that results in the same error about wheel not being contained in bundle root. So far I've worked around this by defining artifact in production/databricks.yaml as:

artifacts:
pipeline-wheel:
type: whl
build: "pants package <path to wheel definition inside services> && mkdir dist && cp ../../../dist/<wheel file> dist/<wheel file>"
# we use pantsbuild.org buildsystem for python that manages wheel packaging, but all artifacts end up in /dist dir at root level...

Are there any ways around this that I'm missing?

Thanks a lot!

3 REPLIES 3

AlbertoLogs
New Contributor II

@kamilmuszynski – Did you figure it out already?

PabloCSD
Contributor II

When I have worked with Databricks Asset Bundles (DAB), I left the databricks.yaml file in the root, and just one databricks.yaml file.

I also made a simple functional DAB project, the file system structure is like this, if it helps you:

dab_test_repo/
├── conf/
│   └── tasks/
│       ├── input_task_config.yml
│       ├── process_task_config.yml
│       └── output_task_config.yml
├── dab_test_repo/
│   ├── tasks/
│   │   ├── __init__.py
│   │   ├── input.py
│   │   ├── process.py
│   │   └── output.py
│   ├── __init__.py
│   ├── common.py
├── tests/
│   ├── unit/
│   │   ├── tasks/
│   │   │   ├── __init__.py
│   │   │   ├── test_input.py
│   │   │   ├── test_process.py
│   │   │   └── test_output.py
│   │   ├── __init__.py
│   │   └── conftest.py
│   ├── __init__.py
├── dist/
│   └── dab_test_repo-0.1.0-py3-none-any.whl
├── .gitignore
├── .pre-commit-config.yaml
├── README.md
├── databricks.yml
└── pyproject.toml

 

 

 I haven't tried with many databricks.yml file, but in the databricks.yml I have configurations for integration and production pipelines for deploying them.

Thanks for the suggestion.

What I ended up doing was to have a separate directory with databricks.yaml per each pipeline, but the file was including all targets (dev, int, prod). I think having a top level databricks.yaml is something that would also work, with proper excludes per target - I need to give it a try at some point 🙂

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group