cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

Managing libraries in workflows with multiple tasks - need to configure a list of libs for all tasks

brian999
Contributor

I have workflows with multiple tasks, each of which need 5 different libraries to run. When I have to update those libraries, I have to go in and make the update in each and every task. So for one workflow I have 20 different places where I have to go through and update the libraries.

I need to be able to designate a list of libraries to be available on the job cluster for all the task that use it, so that I only have to update the libraries in one place.

But from what I can tell, an entirely new cluster definition gets created for job compute every time the workflow runs, so I don't have a single cluster to configure. What am I missing?

2 ACCEPTED SOLUTIONS

Accepted Solutions

daniel_sahal
Esteemed Contributor

brian999
Contributor

Actually I think I found most of a solution here in one of the replies: https://community.databricks.com/t5/administration-architecture/installing-libraries-on-job-clusters...

It seems like I only have to define libs for the first task, and as long as all other tasks use the same job compute, I'm good to go. I'm assuming tasks within a workflow share compute by default?

View solution in original post

4 REPLIES 4

daniel_sahal
Esteemed Contributor

The libs I need to install are all private and not on Pypi. They are .whl files in repo folders. Can that be done with a requirements.txt file?

daniel_sahal
Esteemed Contributor

 

@brian999 

It should be doable.

In requirements.txt you can specify the path to the .whl file

brian999
Contributor

Actually I think I found most of a solution here in one of the replies: https://community.databricks.com/t5/administration-architecture/installing-libraries-on-job-clusters...

It seems like I only have to define libs for the first task, and as long as all other tasks use the same job compute, I'm good to go. I'm assuming tasks within a workflow share compute by default?

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you wonโ€™t want to miss the chance to attend and share knowledge.

If there isnโ€™t a group near you, start one and help create a community that brings people together.

Request a New Group