cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Managing libraries in workflows with multiple tasks - need to configure a list of libs for all tasks

brian999
New Contributor III

I have workflows with multiple tasks, each of which need 5 different libraries to run. When I have to update those libraries, I have to go in and make the update in each and every task. So for one workflow I have 20 different places where I have to go through and update the libraries.

I need to be able to designate a list of libraries to be available on the job cluster for all the task that use it, so that I only have to update the libraries in one place.

But from what I can tell, an entirely new cluster definition gets created for job compute every time the workflow runs, so I don't have a single cluster to configure. What am I missing?

2 ACCEPTED SOLUTIONS

Accepted Solutions

daniel_sahal
Esteemed Contributor

brian999
New Contributor III

Actually I think I found most of a solution here in one of the replies: https://community.databricks.com/t5/administration-architecture/installing-libraries-on-job-clusters...

It seems like I only have to define libs for the first task, and as long as all other tasks use the same job compute, I'm good to go. I'm assuming tasks within a workflow share compute by default?

View solution in original post

4 REPLIES 4

daniel_sahal
Esteemed Contributor

The libs I need to install are all private and not on Pypi. They are .whl files in repo folders. Can that be done with a requirements.txt file?

daniel_sahal
Esteemed Contributor

 

@brian999 

It should be doable.

In requirements.txt you can specify the path to the .whl file

brian999
New Contributor III

Actually I think I found most of a solution here in one of the replies: https://community.databricks.com/t5/administration-architecture/installing-libraries-on-job-clusters...

It seems like I only have to define libs for the first task, and as long as all other tasks use the same job compute, I'm good to go. I'm assuming tasks within a workflow share compute by default?

Join 100K+ Data Experts: Register Now & Grow with Us!

Excited to expand your horizons with us? Click here to Register and begin your journey to success!

Already a member? Login and join your local regional user group! If there isn’t one near you, fill out this form and we’ll create one for you to join!