cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Create python modules for both repos and workspace

Erik
Valued Contributor II

We are using the "databricks_notebook" terraform resource to deploy our notebooks into the "Workspace" as part of our CICD run, and our jobs run notebooks from the workspace. For development we clone the repo into "Repos". At the moment the only modularization of our code is done with %run statements, and we have a large "utils" folder in our repo, but I am investigating how to move to an import-based workflow. In "Repos" it works fine, since the root of the repo is automatically added to sys.path, we can then do things like "import notebooks.utils.stuff" from anywhere in the tree. But when deployed to the Workspace, only the current path is added, not the root "/Workspace", so then the import does not work. I guess we could modify sys.path and add "/Workspace", but

1: That is ugly and error prone, and

2: It makes it likely that we will end up importing the wrong version (from /Workspace) when we are developing on a branch in Repos.

Any tips? Have I maybe overlooked something smart?

Just in case it was not clear, here is our folderstructure:

  • Readme.md
  • notebooks/utils/utilnotebook1.py (and many more)
  • notebooks/gold/silver/problem1/notebook1.py (and silver + bronze)
  • terraform/

2 REPLIES 2

-werners-
Esteemed Contributor III

why don't you run the prod notebooks from repos as well?

That's what it's meant for.

Workspace files are, as you have noticed, very error prone.

RobiTakToRobi
New Contributor II

You can create your own Python package and host it in Azure Artifacts. 

https://learn.microsoft.com/en-us/azure/devops/artifacts/quickstarts/python-packages?view=azure-devo...

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group