cancel
Showing results for 
Search instead for 
Did you mean: 
Machine Learning
Dive into the world of machine learning on the Databricks platform. Explore discussions on algorithms, model training, deployment, and more. Connect with ML enthusiasts and experts.
cancel
Showing results for 
Search instead for 
Did you mean: 

Python environment DAB

Dali1
New Contributor III

Hello,
I am building a pipeline using DAB.

The first step of the dab is to deploy my library as a wheel.

The pipeline is run on a shared databricks cluster.
When I run the job I see that the job is not using exactly the requirements I specified but it used the versions that were already in the cluster.

Is there a way to have a specific environment for my job using this shared cluster ?

1 ACCEPTED SOLUTION

Accepted Solutions

pradeep_singh
Contributor III

This is the nature of shared clusters . You can install libraries for a task but isolation is not guaranteed . if a library is already installed on the cluster it will take priority over what defined for the task.

Any reason you cant use job clusters . They are cheap and provide the isolation needed in your case . Another options is using Serverless jobs with environment specs . 

Thank You
Pradeep Singh - https://www.linkedin.com/in/dbxdev

View solution in original post

2 REPLIES 2

pradeep_singh
Contributor III

This is the nature of shared clusters . You can install libraries for a task but isolation is not guaranteed . if a library is already installed on the cluster it will take priority over what defined for the task.

Any reason you cant use job clusters . They are cheap and provide the isolation needed in your case . Another options is using Serverless jobs with environment specs . 

Thank You
Pradeep Singh - https://www.linkedin.com/in/dbxdev

stbjelcevic
Databricks Employee
Databricks Employee

Hi @Dali1,

+1 to @pradeep_singh, on shared clusters, tasks inherit cluster-installed libraries, so you won’t get a clean, versioned environment. Use a job cluster (new_cluster) or switch to serverless jobs with an environment per task for isolation. With serverless, define job.environments and set environment_key on your wheel/script task. List your wheel/requirements in dependencies to pin exact version.