Package installation for multi-tasks job
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
2 weeks ago
I have a job with the same task to be executed twice with two sets of parameters. In each task is run after cloning a git repo then installing it locally and running a notebook from this repo. However, as each task clones the same repo, I was wondering how to do the install once and for all ?
I tried to add a first task that install the package from the cloned repo, and added a dependency to this first step for the two tasks. Basically:
Task 0:
* from git repo
* %sh
pip install poetry
poetry install ---will install locally cloned package named my_package---
Task 1 and 2:
* depends on Task 0
* same cluster
* from my_package import my_class ---got an exception that thereis no package my_package---
Adding the my_package package to the cluster config is not an option, I need to install it first when running the job
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
2 weeks ago
You can install the custom library from volumes/custom(abfss)/workspace path directly on two tasks as part of dependent libraries.
No need to have task0 just to install libraries.
Hope this helps! 🙂
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
2 weeks ago
That what I've done, but I find it less elegant that setup an environment and sharing it across multiple tasks. It seems to be impossible (unless I build a wheel file and I dont want to) as tasks do not share environment, but anyway, as they run in parallel, there is no overhead installing the package for each task.

