wheel package to install in a serveless workflow
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
01-13-2025 05:40 AM
Hi guys,
Which is the way through Databricks Asset Bundle to declare a new job definition having a serveless compute associated on each task that composes the workflow and be able that inside each notebook task definition is possible to catch the dependent custom libraries that I imported inside the workspace?
I did something like this:
environments:
- environment_key: envir
spec:
client: "1"
dependencies:
- "${workspace.root_path}/artifacts/.internal/data_pipelines-0.0.1-py3-none-any.whl"
tasks:
- task_key: schedule_next_run_for_this_job
description: due to business requirements is needed to reschedule the workflow in the near next run
environment_key: envir
notebook_task:
notebook_path: ../notebook/jobs/export.py
base_parameters:
function: schedule_next_run_for_this_job
env: ${bundle.target}
job_id: "{{job.id}}"
workspace_url: "{{workspace.url}}"
but it returns to me:
Error: cannot create job: A task environment can not be provided for notebook task get_email_infos. Please use the %pip magic command to install notebook-scoped Python libraries and Python wheel packages
the only way to import personal wheel package inside a serveless compute is to install inside the notebook that library?
Because I want to do something like using:
libraries:
- whl: ...
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
01-13-2025 05:44 AM
Hi @jeremy98,
It appears that you are trying to use an environment block to specify dependencies for a notebook task, but this approach is not supported for notebook tasks on serverless compute. Instead, you should use the %pip magic command within the notebook to install the required libraries.
- Create the job definition with the necessary tasks. Each task should specify the notebook path and any parameters required
- Use the %pip magic command inside each notebook to install the custom libraries. This ensures that the libraries are available in the notebook's environment when the task runs.
Here’s an example:
bundle:
name: my-bundle
resources:
jobs:
my-job:
name: my-job
tasks:
- task_key: schedule_next_run_for_this_job
description: due to business requirements is needed to reschedule the workflow in the near next run
notebook_task:
notebook_path: /Workspace/Users/your_username/notebook/jobs/export.py
base_parameters:
function: schedule_next_run_for_this_job
env: ${bundle.target}
job_id: "{{job.id}}"
workspace_url: "{{workspace.url}}"
targets:
dev:
default: true
resources:
jobs:
my-job:
name: my-job
The example content of export.py:
# Install custom libraries using %pip magic command
%pip install /Workspace/Shared/Path/To/your_custom_library.whl
# Your notebook code here
def schedule_next_run_for_this_job():
# Function implementation
pass
# Call the function with parameters
schedule_next_run_for_this_job()
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
01-13-2025 05:51 AM
Hi,
Thanks for this answer! But, any import code from the wheel package should be imported like this for example?
from data_pipelines.core.utils.filters import (
filter_by_time_granularity
)
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
01-13-2025 05:58 AM
Yes, you can import code from a wheel package in your notebook just like you would with any other Python module. Once you have installed the wheel package using %pip, you can import the functions or classes from the package.
For example, if your wheel package contains a module data_pipelines.core.utils.filters and you want to import the filter_by_time_granularity function, you can do it as follows
%pip install /Workspace/Shared/Path/To/your_custom_library.whl
from data_pipelines.core.utils.filters import filter_by_time_granularity
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
01-13-2025 06:02 AM
Hi, mmm ok but how to upload a wheel package at every deployed dab? Because I did it in this way:
artifacts:
lib:
type: whl
build: poetry build
path: .
sync:
include:
- ./dist/*.whl
But this, will deploy the wheel package to my personal root_path:
stg:
default: true
workspace:
host: <host-id>
root_path: /Workspace/Users/${workspace.current_user.userName}/.bundle/${bundle.name}/${bundle.target}
how to specify that the wheel package needs to be uploaded every time in a shared location?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
01-13-2025 06:27 AM - edited 01-13-2025 06:27 AM
And another question, when doing the installation of the python wheel in the serveless compute is possible also to specify the version of python on the serveless compute? Because I tried to do it, but it says: ERROR: Package 'data-pipelines' requires a different Python: 3.10.12 not in '<4.0,>=3.11'
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
01-13-2025 06:37 AM
You can define the artifact_path in the workspace mapping:
This path should be a shared location accessible by all users who need to use the wheel package
bundle:
name: my-bundle
artifacts:
lib:
type: whl
build: poetry build
path: .
sync:
include:
- ./dist/*.whl
workspace:
artifact_path: /Workspace/Shared/Path/To/Shared/Location/.bundle/${bundle.name}/${bundle.target}
targets:
stg:
default: true
workspace:
host: <host-id>
root_path: /Workspace/Shared/Path/To/Shared/Location/.bundle/${bundle.name}/${bundle.target}
artifact_path: This specifies the path where the artifacts (wheel packages) will be stored in the workspace. By setting it to a shared location, you ensure that the wheel package is accessible to all users
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
01-13-2025 06:38 AM
About your second question, it is not possible to specify the Python version directly during the installation of a python wheel, the serveless runtime would have the build-in python version and if we upgrade it or downgrade it it make break the system due to dependencies.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
01-13-2025 06:42 AM
Hi, thanks for your answers really helpful. But, this means that I should find a way to downgrade the python version specified in my pyproject.toml (and match it with all of my dependencies)? In order to be able to run the package in any serveless cluster?
Because, I don't know which python version I will have every time, right?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
01-13-2025 07:08 AM
Hi, no problem! and serverless will use the latest DBR version mentioned here: https://docs.databricks.com/en/release-notes/serverless/index.html#version-154 based upon that the python version being used. In this case DBR 15.4 LTS which uses. So we need to refactor any dependencies to be compatible with that python verison, and keep on checking if any release update that comes with a different DBR/Python version
-
Python: 3.11.0
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
01-13-2025 08:42 AM
Hi Alberto, thanks for the answer again, I don't understand your point. I mean you said that the actual cluster is working also for Python 3.11, but seems that when I was catching a new serverless cluster this hasn't a python 3.11 version but less. What do I need to do?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
01-13-2025 09:06 AM
Hey Jeremy, serverless should be using 3.11 too, do you see a different version? serverless should pick DBR version 15.4 which using 3.11 based on https://docs.databricks.com/en/release-notes/serverless/index.html#version-154
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
01-13-2025 09:08 AM
Oh I see above error Python: 3.10.12 not in '<4.0,>=3.11' and I just tested it and indeed using 3.10, let me check
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
01-13-2025 09:13 AM
I see the reason now, there are 2 versions of serverless one uses 1 - 3.10.12 and 2 uses the 3.11, please see: https://docs.databricks.com/en/release-notes/serverless/client-two.html
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
01-13-2025 10:00 AM - edited 01-13-2025 10:21 AM
Hi,
Thanks again for the answer :), ok, but Do I need to import environment field as I did before? Consider that I'm using DABs
Like this?
environments:
- environment_key: env_for_data_pipelines_whl
spec:
client: "2"
edit: I did it before defining the tasks, in this way each task will inherit the environment client specific, but it isn't set.. still have the same problem

