Databricks Community

jeremy98 · ‎01-13-2025

Hi guys,
Which is the way through Databricks Asset Bundle to declare a new job definition having a serveless compute associated on each task that composes the workflow and be able that inside each notebook task definition is possible to catch the dependent custom libraries that I imported inside the workspace?

I did something like this:

      environments:
      - environment_key: envir
        spec:
          client: "1"
          dependencies:
            - "${workspace.root_path}/artifacts/.internal/data_pipelines-0.0.1-py3-none-any.whl"

      tasks:

        - task_key: schedule_next_run_for_this_job
          description: due to business requirements is needed to reschedule the workflow in the near next run
          environment_key: envir
          notebook_task:
            notebook_path: ../notebook/jobs/export.py
            base_parameters:
              function: schedule_next_run_for_this_job
              env: ${bundle.target}
              job_id: "{{job.id}}"
              workspace_url: "{{workspace.url}}"

but it returns to me:

Error: cannot create job: A task environment can not be provided for notebook task get_email_infos. Please use the %pip magic command to install notebook-scoped Python libraries and Python wheel packages

the only way to import personal wheel package inside a serveless compute is to install inside the notebook that library?

Because I want to do something like using:

libraries:
   - whl: ...

Alberto_Umana · ‎01-13-2025

Hi @jeremy98,

It appears that you are trying to use an environment block to specify dependencies for a notebook task, but this approach is not supported for notebook tasks on serverless compute. Instead, you should use the %pip magic command within the notebook to install the required libraries.

Create the job definition with the necessary tasks. Each task should specify the notebook path and any parameters required
Use the %pip magic command inside each notebook to install the custom libraries. This ensures that the libraries are available in the notebook's environment when the task runs.

Here’s an example:

bundle:

resources:

jobs:

my-job:

tasks:

- task_key: schedule_next_run_for_this_job

description: due to business requirements is needed to reschedule the workflow in the near next run

notebook_task:

notebook_path: /Workspace/Users/your_username/notebook/jobs/export.py

base_parameters:

function: schedule_next_run_for_this_job

env: ${bundle.target}

job_id: "{{job.id}}"

workspace_url: "{{workspace.url}}"

targets:

dev:

default: true

resources:

jobs:

my-job:

The example content of export.py:

# Install custom libraries using %pip magic command

%pip install /Workspace/Shared/Path/To/your_custom_library.whl

# Your notebook code here

def schedule_next_run_for_this_job():

# Function implementation

pass

# Call the function with parameters

schedule_next_run_for_this_job()

jeremy98 · ‎01-13-2025

Hi,
Thanks for this answer! But, any import code from the wheel package should be imported like this for example?

from data_pipelines.core.utils.filters import (
    filter_by_time_granularity
)

Alberto_Umana · ‎01-13-2025

Yes, you can import code from a wheel package in your notebook just like you would with any other Python module. Once you have installed the wheel package using %pip, you can import the functions or classes from the package.

For example, if your wheel package contains a module data_pipelines.core.utils.filters and you want to import the filter_by_time_granularity function, you can do it as follows

%pip install /Workspace/Shared/Path/To/your_custom_library.whl

from data_pipelines.core.utils.filters import filter_by_time_granularity

jeremy98 · ‎01-13-2025

Hi, mmm ok but how to upload a wheel package at every deployed dab? Because I did it in this way:

artifacts:
  lib:
    type: whl
    build: poetry build
    path: .

sync:
  include:
    - ./dist/*.whl

But this, will deploy the wheel package to my personal root_path:

  stg:
    default: true
    workspace: 
      host: <host-id>
      root_path: /Workspace/Users/${workspace.current_user.userName}/.bundle/${bundle.name}/${bundle.target}

how to specify that the wheel package needs to be uploaded every time in a shared location?

jeremy98 · ‎01-13-2025

And another question, when doing the installation of the python wheel in the serveless compute is possible also to specify the version of python on the serveless compute? Because I tried to do it, but it says: ERROR: Package 'data-pipelines' requires a different Python: 3.10.12 not in '<4.0,>=3.11'

Alberto_Umana · ‎01-13-2025

You can define the artifact_path in the workspace mapping:

This path should be a shared location accessible by all users who need to use the wheel package

bundle:

artifacts:

lib:

type: whl

build: poetry build

path: .

sync:

include:

- ./dist/*.whl

workspace:

artifact_path: /Workspace/Shared/Path/To/Shared/Location/.bundle/${bundle.name}/${bundle.target}

targets:

stg:

default: true

workspace:

host: <host-id>

root_path: /Workspace/Shared/Path/To/Shared/Location/.bundle/${bundle.name}/${bundle.target}

artifact_path: This specifies the path where the artifacts (wheel packages) will be stored in the workspace. By setting it to a shared location, you ensure that the wheel package is accessible to all users

Alberto_Umana · ‎01-13-2025

About your second question, it is not possible to specify the Python version directly during the installation of a python wheel, the serveless runtime would have the build-in python version and if we upgrade it or downgrade it it make break the system due to dependencies.

jeremy98 · ‎01-13-2025

Hi, thanks for your answers really helpful. But, this means that I should find a way to downgrade the python version specified in my pyproject.toml (and match it with all of my dependencies)? In order to be able to run the package in any serveless cluster?

Because, I don't know which python version I will have every time, right?

Alberto_Umana · ‎01-13-2025

Hi, no problem! and serverless will use the latest DBR version mentioned here: https://docs.databricks.com/en/release-notes/serverless/index.html#version-154 based upon that the python version being used. In this case DBR 15.4 LTS which uses. So we need to refactor any dependencies to be compatible with that python verison, and keep on checking if any release update that comes with a different DBR/Python version

Python: 3.11.0

jeremy98 · ‎01-13-2025

Hi Alberto, thanks for the answer again, I don't understand your point. I mean you said that the actual cluster is working also for Python 3.11, but seems that when I was catching a new serverless cluster this hasn't a python 3.11 version but less. What do I need to do?

Alberto_Umana · ‎01-13-2025

Hey Jeremy, serverless should be using 3.11 too, do you see a different version? serverless should pick DBR version 15.4 which using 3.11 based on https://docs.databricks.com/en/release-notes/serverless/index.html#version-154

Alberto_Umana · ‎01-13-2025

Oh I see above error Python: 3.10.12 not in '<4.0,>=3.11' and I just tested it and indeed using 3.10, let me check

Alberto_Umana · ‎01-13-2025

I see the reason now, there are 2 versions of serverless one uses 1 - 3.10.12 and 2 uses the 3.11, please see: https://docs.databricks.com/en/release-notes/serverless/client-two.html

jeremy98 · ‎01-13-2025

Hi,
Thanks again for the answer :), ok, but Do I need to import environment field as I did before? Consider that I'm using DABs

Like this?

      environments: 
        - environment_key: env_for_data_pipelines_whl
          spec: 
            client: "2"

edit: I did it before defining the tasks, in this way each task will inherit the environment client specific, but it isn't set.. still have the same problem