Create python modules for both repos and workspace
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
05-19-2023 09:59 AM
We are using the "databricks_notebook" terraform resource to deploy our notebooks into the "Workspace" as part of our CICD run, and our jobs run notebooks from the workspace. For development we clone the repo into "Repos". At the moment the only modularization of our code is done with %run statements, and we have a large "utils" folder in our repo, but I am investigating how to move to an import-based workflow. In "Repos" it works fine, since the root of the repo is automatically added to sys.path, we can then do things like "import notebooks.utils.stuff" from anywhere in the tree. But when deployed to the Workspace, only the current path is added, not the root "/Workspace", so then the import does not work. I guess we could modify sys.path and add "/Workspace", but
1: That is ugly and error prone, and
2: It makes it likely that we will end up importing the wrong version (from /Workspace) when we are developing on a branch in Repos.
Any tips? Have I maybe overlooked something smart?
Just in case it was not clear, here is our folderstructure:
- Readme.md
- notebooks/utils/utilnotebook1.py (and many more)
- notebooks/gold/silver/problem1/notebook1.py (and silver + bronze)
- terraform/
- Labels:
-
Databricks notebook
-
Python
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
05-26-2023 01:39 AM
why don't you run the prod notebooks from repos as well?
That's what it's meant for.
Workspace files are, as you have noticed, very error prone.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
05-27-2023 09:30 AM
You can create your own Python package and host it in Azure Artifacts.

