07-10-2024 10:03 AM
I have workflows with multiple tasks, each of which need 5 different libraries to run. When I have to update those libraries, I have to go in and make the update in each and every task. So for one workflow I have 20 different places where I have to go through and update the libraries.
I need to be able to designate a list of libraries to be available on the job cluster for all the task that use it, so that I only have to update the libraries in one place.
But from what I can tell, an entirely new cluster definition gets created for job compute every time the workflow runs, so I don't have a single cluster to configure. What am I missing?
07-11-2024 12:01 AM
07-11-2024 08:22 AM
Actually I think I found most of a solution here in one of the replies: https://community.databricks.com/t5/administration-architecture/installing-libraries-on-job-clusters...
It seems like I only have to define libs for the first task, and as long as all other tasks use the same job compute, I'm good to go. I'm assuming tasks within a workflow share compute by default?
07-11-2024 12:01 AM
07-11-2024 05:34 AM
The libs I need to install are all private and not on Pypi. They are .whl files in repo folders. Can that be done with a requirements.txt file?
07-11-2024 10:29 PM - edited 07-11-2024 10:29 PM
07-11-2024 08:22 AM
Actually I think I found most of a solution here in one of the replies: https://community.databricks.com/t5/administration-architecture/installing-libraries-on-job-clusters...
It seems like I only have to define libs for the first task, and as long as all other tasks use the same job compute, I'm good to go. I'm assuming tasks within a workflow share compute by default?
Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.
If there isn’t a group near you, start one and help create a community that brings people together.
Request a New Group