Python environment DAB

Dali1
New Contributor III

Hello,
I am building a pipeline using DAB.

The first step of the dab is to deploy my library as a wheel.

The pipeline is run on a shared databricks cluster.
When I run the job I see that the job is not using exactly the requirements I specified but it used the versions that were already in the cluster.

Is there a way to have a specific environment for my job using this shared cluster ?

pradeep_singh
Contributor III

This is the nature of shared clusters . You can install libraries for a task but isolation is not guaranteed . if a library is already installed on the cluster it will take priority over what defined for the task.

Any reason you cant use job clusters . They are cheap and provide the isolation needed in your case . Another options is using Serverless jobs with environment specs . 

Thank You
Pradeep Singh - https://www.linkedin.com/in/dbxdev

View solution in original post

stbjelcevic
Databricks Employee
Databricks Employee

Hi @Dali1,

+1 to @pradeep_singh, on shared clusters, tasks inherit cluster-installed libraries, so you won’t get a clean, versioned environment. Use a job cluster (new_cluster) or switch to serverless jobs with an environment per task for isolation. With serverless, define job.environments and set environment_key on your wheel/script task. List your wheel/requirements in dependencies to pin exact version.