Databricks Community

sandy311 · ‎06-13-2025

Has anyone figured out how to install packages on serverless compute using asset bundle,s similar to how we handle it for jobs or job tasks?
I didn’t see any direct option for this, apart from installing packages manually within a notebook.

I tried installing packages on DLT serverless compute via asset bundles using the following approach, but it doesn’t seem to apply the package correctly:

resources:
  jobs:
    xyz:
      name: x_y_z

      tasks:
        - task_key: PipelineTask
          pipeline_task:
            pipeline_id: ${resources.pipelines.my_pipeline.id}
          libraries:
            - pypi:
                package: pandera
                repo: https://pypi.org/simple/

      queue:
        enabled: true
      max_concurrent_runs: 1

      environments:
        - environment_key: default
          spec:
            client: "1"
            dependencies:
              - pandera

sandeepss

cgrant · ‎06-24-2025

Environments are the way to incorporate third party libraries with serverless compute.

In the provided example, the environment has been correctly defined, but it needs to be linked to the job task. You can do this by adding an environment key in the task definition like this

# A serverless job (environment spec)
resources:
  jobs:
    serverless_job_environment:
      name: serverless_job_environment

      tasks:
        - task_key: task
          spark_python_task:
            python_file: ../src/main.py

          # The key that references an environment spec in a job.
          # https://docs.databricks.com/api/workspace/jobs/create#tasks-environment_key
          environment_key: default

      # A list of task execution environment specifications that can be referenced by tasks of this job.
      environments:
        - environment_key: default

          # Full documentation of this spec can be found at:
          # https://docs.databricks.com/api/workspace/jobs/create#environments-spec
          spec:
            client: '1'
            dependencies:
              - my-library

sandy311 · ‎06-24-2025

I know this can be works with task like notebook, python etc, but it won't work with DLT pipelines

sandeepss

mark_ott · a week ago

Installing Python packages on Databricks serverless compute via asset bundles is possible, but there are some unique limitations and required configuration adjustments compared to traditional jobs or job tasks. The core methods to install packages for serverless workloads involve either asset bundles’ environment sections or using Python wheel files for dependencies.

Key Findings

Asset Bundles and Environments: To add third-party libraries to DLT serverless pipelines, you must use the environments section within your asset bundle definition. However, simply specifying the dependencies in the environment block isn’t enough; you need to explicitly reference the environment in the task itself. Without this reference, your custom or external packages are not correctly installed at runtime.
Linking Environment to Task: The environment key defined under environments must be linked in your pipeline/job task using the environment_key. This ensures your pipeline attempts to pull in the dependencies you listed.
Supported Package Types: Installing packages via asset bundles is most predictable when you package dependencies as Python wheel files (.whl) and list them in the environment’s dependencies property. For pip/conda-style installations, support may vary, and pip-installing directly from PyPI within the configuration may not always work as seamlessly on serverless compute compared to standard clusters.
Manual Install Still Works: You can still install packages at runtime in notebooks using %pip install ..., but this defeats full automation and reproducibility via asset bundles.
Limitations: JAR/Maven packages and direct custom data source connections are not supported on serverless; support is Python-centric.

Alternative (Wheel Packaging)

If you have more complex dependencies or custom code, pre-package your dependencies (or your code and dependencies) as a wheel file and reference them in your bundle, which is well-supported and robust:

text

environments:
  - environment_key: myenv
    spec:
      dependencies:
        - dist/my_package-0.1.0-py3-none-any.whl

# Reference the environment_key in the task as shown above.

Summary Table

Installation Approach	Works on Serverless?	Notes
`pip` in notebook	Yes	Manual, not reproducible
Asset bundle, env not linked	No	Must link environment_key
Asset bundle with wheel file	Yes	Best for custom code
Asset bundle w/ PyPI in env	Yes (if linked)	Use `dependencies` block
JAR/Maven dependencies	No	Not supported

For best results, package dependencies in a wheel, reference it in your bundle environment, and always link your environment_key in your job/task definition. If your use case is still not supported, consider manual %pip install in a notebook or check for any new Databricks documentation regarding serverless package management.