4 weeks ago
Hi guys,
Which is the way through Databricks Asset Bundle to declare a new job definition having a serveless compute associated on each task that composes the workflow and be able that inside each notebook task definition is possible to catch the dependent custom libraries that I imported inside the workspace?
I did something like this:
environments:
- environment_key: envir
spec:
client: "1"
dependencies:
- "${workspace.root_path}/artifacts/.internal/data_pipelines-0.0.1-py3-none-any.whl"
tasks:
- task_key: schedule_next_run_for_this_job
description: due to business requirements is needed to reschedule the workflow in the near next run
environment_key: envir
notebook_task:
notebook_path: ../notebook/jobs/export.py
base_parameters:
function: schedule_next_run_for_this_job
env: ${bundle.target}
job_id: "{{job.id}}"
workspace_url: "{{workspace.url}}"
but it returns to me:
Error: cannot create job: A task environment can not be provided for notebook task get_email_infos. Please use the %pip magic command to install notebook-scoped Python libraries and Python wheel packages
the only way to import personal wheel package inside a serveless compute is to install inside the notebook that library?
Because I want to do something like using:
libraries:
- whl: ...
4 weeks ago
Hi @jeremy98,
It appears that you are trying to use an environment block to specify dependencies for a notebook task, but this approach is not supported for notebook tasks on serverless compute. Instead, you should use the %pip magic command within the notebook to install the required libraries.
Here’s an example:
bundle:
name: my-bundle
resources:
jobs:
my-job:
name: my-job
tasks:
- task_key: schedule_next_run_for_this_job
description: due to business requirements is needed to reschedule the workflow in the near next run
notebook_task:
notebook_path: /Workspace/Users/your_username/notebook/jobs/export.py
base_parameters:
function: schedule_next_run_for_this_job
env: ${bundle.target}
job_id: "{{job.id}}"
workspace_url: "{{workspace.url}}"
targets:
dev:
default: true
resources:
jobs:
my-job:
name: my-job
The example content of export.py:
# Install custom libraries using %pip magic command
%pip install /Workspace/Shared/Path/To/your_custom_library.whl
# Your notebook code here
def schedule_next_run_for_this_job():
# Function implementation
pass
# Call the function with parameters
schedule_next_run_for_this_job()
4 weeks ago
Hi,
Thanks for this answer! But, any import code from the wheel package should be imported like this for example?
from data_pipelines.core.utils.filters import (
filter_by_time_granularity
)
4 weeks ago
Yes, you can import code from a wheel package in your notebook just like you would with any other Python module. Once you have installed the wheel package using %pip, you can import the functions or classes from the package.
For example, if your wheel package contains a module data_pipelines.core.utils.filters and you want to import the filter_by_time_granularity function, you can do it as follows
%pip install /Workspace/Shared/Path/To/your_custom_library.whl
from data_pipelines.core.utils.filters import filter_by_time_granularity
4 weeks ago
Hi, mmm ok but how to upload a wheel package at every deployed dab? Because I did it in this way:
artifacts:
lib:
type: whl
build: poetry build
path: .
sync:
include:
- ./dist/*.whl
But this, will deploy the wheel package to my personal root_path:
stg:
default: true
workspace:
host: <host-id>
root_path: /Workspace/Users/${workspace.current_user.userName}/.bundle/${bundle.name}/${bundle.target}
how to specify that the wheel package needs to be uploaded every time in a shared location?
4 weeks ago - last edited 4 weeks ago
And another question, when doing the installation of the python wheel in the serveless compute is possible also to specify the version of python on the serveless compute? Because I tried to do it, but it says: ERROR: Package 'data-pipelines' requires a different Python: 3.10.12 not in '<4.0,>=3.11'
4 weeks ago
You can define the artifact_path in the workspace mapping:
This path should be a shared location accessible by all users who need to use the wheel package
bundle:
name: my-bundle
artifacts:
lib:
type: whl
build: poetry build
path: .
sync:
include:
- ./dist/*.whl
workspace:
artifact_path: /Workspace/Shared/Path/To/Shared/Location/.bundle/${bundle.name}/${bundle.target}
targets:
stg:
default: true
workspace:
host: <host-id>
root_path: /Workspace/Shared/Path/To/Shared/Location/.bundle/${bundle.name}/${bundle.target}
artifact_path: This specifies the path where the artifacts (wheel packages) will be stored in the workspace. By setting it to a shared location, you ensure that the wheel package is accessible to all users
4 weeks ago
About your second question, it is not possible to specify the Python version directly during the installation of a python wheel, the serveless runtime would have the build-in python version and if we upgrade it or downgrade it it make break the system due to dependencies.
4 weeks ago
Hi, thanks for your answers really helpful. But, this means that I should find a way to downgrade the python version specified in my pyproject.toml (and match it with all of my dependencies)? In order to be able to run the package in any serveless cluster?
Because, I don't know which python version I will have every time, right?
4 weeks ago
Hi, no problem! and serverless will use the latest DBR version mentioned here: https://docs.databricks.com/en/release-notes/serverless/index.html#version-154 based upon that the python version being used. In this case DBR 15.4 LTS which uses. So we need to refactor any dependencies to be compatible with that python verison, and keep on checking if any release update that comes with a different DBR/Python version
Python: 3.11.0
4 weeks ago
Hi Alberto, thanks for the answer again, I don't understand your point. I mean you said that the actual cluster is working also for Python 3.11, but seems that when I was catching a new serverless cluster this hasn't a python 3.11 version but less. What do I need to do?
4 weeks ago
Hey Jeremy, serverless should be using 3.11 too, do you see a different version? serverless should pick DBR version 15.4 which using 3.11 based on https://docs.databricks.com/en/release-notes/serverless/index.html#version-154
4 weeks ago
Oh I see above error Python: 3.10.12 not in '<4.0,>=3.11' and I just tested it and indeed using 3.10, let me check
4 weeks ago
I see the reason now, there are 2 versions of serverless one uses 1 - 3.10.12 and 2 uses the 3.11, please see: https://docs.databricks.com/en/release-notes/serverless/client-two.html
4 weeks ago - last edited 4 weeks ago
Hi,
Thanks again for the answer :), ok, but Do I need to import environment field as I did before? Consider that I'm using DABs
Like this?
environments:
- environment_key: env_for_data_pipelines_whl
spec:
client: "2"
edit: I did it before defining the tasks, in this way each task will inherit the environment client specific, but it isn't set.. still have the same problem
Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.
If there isn’t a group near you, start one and help create a community that brings people together.
Request a New Group