cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Install python packages on serverless compute in DLT pipelines (using asset bundles)

sandy311
New Contributor III

Has anyone figured out how to install packages on serverless compute using asset bundle,s similar to how we handle it for jobs or job tasks?
I didn’t see any direct option for this, apart from installing packages manually within a notebook.

I tried installing packages on DLT serverless compute via asset bundles using the following approach, but it doesn’t seem to apply the package correctly:

 

resources:
  jobs:
    xyz:
      name: x_y_z

      tasks:
        - task_key: PipelineTask
          pipeline_task:
            pipeline_id: ${resources.pipelines.my_pipeline.id}
          libraries:
            - pypi:
                package: pandera
                repo: https://pypi.org/simple/

      queue:
        enabled: true
      max_concurrent_runs: 1

      environments:
        - environment_key: default
          spec:
            client: "1"
            dependencies:
              - pandera

 

sandeepss
2 REPLIES 2

cgrant
Databricks Employee
Databricks Employee

Environments are the way to incorporate third party libraries with serverless compute.

In the provided example, the environment has been correctly defined, but it needs to be linked to the job task. You can do this by adding an environment key in the task definition like this

# A serverless job (environment spec)
resources:
  jobs:
    serverless_job_environment:
      name: serverless_job_environment

      tasks:
        - task_key: task
          spark_python_task:
            python_file: ../src/main.py

          # The key that references an environment spec in a job.
          # https://docs.databricks.com/api/workspace/jobs/create#tasks-environment_key
          environment_key: default

      # A list of task execution environment specifications that can be referenced by tasks of this job.
      environments:
        - environment_key: default

          # Full documentation of this spec can be found at:
          # https://docs.databricks.com/api/workspace/jobs/create#environments-spec
          spec:
            client: '1'
            dependencies:
              - my-library

 

sandy311
New Contributor III

I know this can be works with task like notebook, python etc, but it won't work with DLT pipelines 

sandeepss