cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Install python packages on serverless compute in DLT pipelines (using asset bundles)

sandy311
New Contributor III

Has anyone figured out how to install packages on serverless compute using asset bundle,s similar to how we handle it for jobs or job tasks?
I didn’t see any direct option for this, apart from installing packages manually within a notebook.

I tried installing packages on DLT serverless compute via asset bundles using the following approach, but it doesn’t seem to apply the package correctly:

 

resources:
  jobs:
    xyz:
      name: x_y_z

      tasks:
        - task_key: PipelineTask
          pipeline_task:
            pipeline_id: ${resources.pipelines.my_pipeline.id}
          libraries:
            - pypi:
                package: pandera
                repo: https://pypi.org/simple/

      queue:
        enabled: true
      max_concurrent_runs: 1

      environments:
        - environment_key: default
          spec:
            client: "1"
            dependencies:
              - pandera

 

sandeepss
3 REPLIES 3

cgrant
Databricks Employee
Databricks Employee

Environments are the way to incorporate third party libraries with serverless compute.

In the provided example, the environment has been correctly defined, but it needs to be linked to the job task. You can do this by adding an environment key in the task definition like this

# A serverless job (environment spec)
resources:
  jobs:
    serverless_job_environment:
      name: serverless_job_environment

      tasks:
        - task_key: task
          spark_python_task:
            python_file: ../src/main.py

          # The key that references an environment spec in a job.
          # https://docs.databricks.com/api/workspace/jobs/create#tasks-environment_key
          environment_key: default

      # A list of task execution environment specifications that can be referenced by tasks of this job.
      environments:
        - environment_key: default

          # Full documentation of this spec can be found at:
          # https://docs.databricks.com/api/workspace/jobs/create#environments-spec
          spec:
            client: '1'
            dependencies:
              - my-library

 

sandy311
New Contributor III

I know this can be works with task like notebook, python etc, but it won't work with DLT pipelines 

sandeepss

mark_ott
Databricks Employee
Databricks Employee

Installing Python packages on Databricks serverless compute via asset bundles is possible, but there are some unique limitations and required configuration adjustments compared to traditional jobs or job tasks. The core methods to install packages for serverless workloads involve either asset bundles’ environment sections or using Python wheel files for dependencies.

Key Findings

  • Asset Bundles and Environments: To add third-party libraries to DLT serverless pipelines, you must use the environments section within your asset bundle definition. However, simply specifying the dependencies in the environment block isn’t enough; you need to explicitly reference the environment in the task itself. Without this reference, your custom or external packages are not correctly installed at runtime.​

  • Linking Environment to Task: The environment key defined under environments must be linked in your pipeline/job task using the environment_key. This ensures your pipeline attempts to pull in the dependencies you listed.​

  • Supported Package Types: Installing packages via asset bundles is most predictable when you package dependencies as Python wheel files (.whl) and list them in the environment’s dependencies property. For pip/conda-style installations, support may vary, and pip-installing directly from PyPI within the configuration may not always work as seamlessly on serverless compute compared to standard clusters.​

  • Manual Install Still Works: You can still install packages at runtime in notebooks using %pip install ..., but this defeats full automation and reproducibility via asset bundles.​

  • Limitations: JAR/Maven packages and direct custom data source connections are not supported on serverless; support is Python-centric.​

Recommended Solution

Update your job/task configuration as follows:

text
environments: - environment_key: default spec: client: "1" dependencies: - pandera resources: jobs: xyz: name: x_y_z tasks: - task_key: PipelineTask pipeline_task: pipeline_id: ${resources.pipelines.my_pipeline.id} environment_key: default # <-- Link environment here

This binding ensures the default environment (which lists pandera as a dependency) is actually used when the pipeline runs.​

Alternative (Wheel Packaging)

If you have more complex dependencies or custom code, pre-package your dependencies (or your code and dependencies) as a wheel file and reference them in your bundle, which is well-supported and robust:

text
environments: - environment_key: myenv spec: dependencies: - dist/my_package-0.1.0-py3-none-any.whl # Reference the environment_key in the task as shown above.

Summary Table

Installation Approach Works on Serverless? Notes
pip in notebook Yes Manual, not reproducible
Asset bundle, env not linked No Must link environment_key
Asset bundle with wheel file Yes Best for custom code
Asset bundle w/ PyPI in env Yes (if linked) Use dependencies block
JAR/Maven dependencies No Not supported
 
 

For best results, package dependencies in a wheel, reference it in your bundle environment, and always link your environment_key in your job/task definition. If your use case is still not supported, consider manual %pip install in a notebook or check for any new Databricks documentation regarding serverless package management.​

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now