cancel
Showing results forĀ 
Search instead forĀ 
Did you mean:Ā 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results forĀ 
Search instead forĀ 
Did you mean:Ā 

Asset Bundles: Shared libraries and notebooks in monorepo multi-bundle setup

auso
New Contributor

I am part of a small team of Data Engineers which started using Databricks Asset Bundles one year ago. Our code base consists of typical ETL-workloads written primarily in Jupyter notebooks (.ipynb), and jobs (.yaml) with our codebase spanning across a large number of different business domains.

Currently, our code base consists of a single monorepo with one large bundle containing all our notebooks, jobs, libraries etc.

Our code base has grown to a size where we see the need to split our single bundle into several smaller bundles - one for each business domain.

We are envisioning a setup similar to the following (simplified) structure:

monorepo/
│
ā”œā”€ā”€ shared_notebooks/
ā”œā”€ā”€ shared_libraries/
ā”œā”€ā”€ variables.yml
│
ā”œā”€ā”€ Bundle_A/
│   ā”œā”€ā”€ resources/
│   ā”œā”€ā”€ src/
│   └── databricks.yml
│
└── Bundle_B/
    ā”œā”€ā”€ resources/
    ā”œā”€ā”€ src/
    └── databricks.yml

Where the repo contains some shared notebooks and libraries which may be used in all bundles in our repository.

Does anyone have some suggestions for how this should be implemented?

  1. How can we "import" shared assets (notebooks, libraries and variables) into our bundles?
  2. Does our approach to splitting up our mono-bundle repository seem sensible?

Thanks in advance for your insights!

Kaspar Hauser

3 REPLIES 3

bee-jugger
New Contributor II

@auso , did you get an answer or a solution?

Witold
Honored Contributor

Yes, it's feasible. In DAB you just need to use paths to import common libraries.

-werners-
Esteemed Contributor III

1. the easiest way to do this is to package your shared librabries into a wheel (suppose you use python).  Like that you do not have to mess with the pythonpath and you can install these libs automatically to any cluster (via policies or dabs or whatever).

2. totally makes sense, we do it in a similar way

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now