03-06-2024 03:14 AM
I'm trying to adopt a code base to use asset bundles. I was trying to come up with folder structure that will work for our bundles and came up with layout as below:
common/ (source code)
services/ (source code)
dist/ (here artifacts from monorepo are built; I can't change this)
db-asset-bundles/
data-pipeline/
integration/
databricks.yaml
production/
databricks.yaml
resources/
variables.yaml
artifacts.yaml
I'd like for integration and production bundles to share some common configuration. I've discovered that I can include '../resources/variables.yaml' from both integration/databricks.yaml and production/databricks.yaml, but including artifacts result in:
Error: path (...redacted...)/db-asset-bundles/data-pipeline/resources is not contained in bundle root path
Are there any rules of what can be included from databricks.yaml - does it have to be a folder on the same level, or below the file?
The same problem happens when I try to include wheels built into /dist directory in the root of monorepo - I can't reference to them from databricks.yaml as it would require a path like '../../../dist/[wheel-name]', and that results in the same error about wheel not being contained in bundle root. So far I've worked around this by defining artifact in production/databricks.yaml as:
artifacts:
pipeline-wheel:
type: whl
build: "pants package <path to wheel definition inside services> && mkdir dist && cp ../../../dist/<wheel file> dist/<wheel file>"
# we use pantsbuild.org buildsystem for python that manages wheel packaging, but all artifacts end up in /dist dir at root level...
Are there any ways around this that I'm missing?
Thanks a lot!
10-16-2024 04:41 AM
@kamilmuszynski – Did you figure it out already?
10-16-2024 06:05 AM - edited 10-16-2024 06:08 AM
When I have worked with Databricks Asset Bundles (DAB), I left the databricks.yaml file in the root, and just one databricks.yaml file.
I also made a simple functional DAB project, the file system structure is like this, if it helps you:
dab_test_repo/
├── conf/
│ └── tasks/
│ ├── input_task_config.yml
│ ├── process_task_config.yml
│ └── output_task_config.yml
├── dab_test_repo/
│ ├── tasks/
│ │ ├── __init__.py
│ │ ├── input.py
│ │ ├── process.py
│ │ └── output.py
│ ├── __init__.py
│ ├── common.py
├── tests/
│ ├── unit/
│ │ ├── tasks/
│ │ │ ├── __init__.py
│ │ │ ├── test_input.py
│ │ │ ├── test_process.py
│ │ │ └── test_output.py
│ │ ├── __init__.py
│ │ └── conftest.py
│ ├── __init__.py
├── dist/
│ └── dab_test_repo-0.1.0-py3-none-any.whl
├── .gitignore
├── .pre-commit-config.yaml
├── README.md
├── databricks.yml
└── pyproject.toml
I haven't tried with many databricks.yml file, but in the databricks.yml I have configurations for integration and production pipelines for deploying them.
10-18-2024 12:28 AM
Thanks for the suggestion.
What I ended up doing was to have a separate directory with databricks.yaml per each pipeline, but the file was including all targets (dev, int, prod). I think having a top level databricks.yaml is something that would also work, with proper excludes per target - I need to give it a try at some point 🙂
03-11-2025 02:26 AM
Have you ever found a solution for this? I'm looking for this use case of having certain excluded per target.
Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!
Sign Up Now