<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Package installation for multi-tasks job in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/package-installation-for-multi-tasks-job/m-p/114459#M44833</link>
    <description>&lt;P&gt;You can install the custom library from volumes/custom(abfss)/workspace path directly on two tasks as part of dependent libraries.&lt;/P&gt;&lt;P&gt;No need to have task0 just to install libraries.&lt;/P&gt;&lt;P&gt;Hope this helps!&amp;nbsp;&lt;span class="lia-unicode-emoji" title=":slightly_smiling_face:"&gt;🙂&lt;/span&gt;&lt;/P&gt;</description>
    <pubDate>Thu, 03 Apr 2025 20:53:20 GMT</pubDate>
    <dc:creator>srinum89</dc:creator>
    <dc:date>2025-04-03T20:53:20Z</dc:date>
    <item>
      <title>Package installation for multi-tasks job</title>
      <link>https://community.databricks.com/t5/data-engineering/package-installation-for-multi-tasks-job/m-p/114452#M44828</link>
      <description>&lt;P&gt;I have a job with the same task to be executed twice with two sets of parameters. In each task is run after cloning a git repo then installing it locally and running a notebook from this repo. However, as each task clones the same repo, I was wondering how to do the install once and for all ?&amp;nbsp;&lt;/P&gt;&lt;P&gt;I tried to add a first task that install the package from the cloned repo, and added a dependency to this first step for the two tasks. Basically:&lt;/P&gt;&lt;P&gt;Task 0:&lt;BR /&gt;&amp;nbsp; &amp;nbsp;* from git repo&lt;BR /&gt;&amp;nbsp; &amp;nbsp;* %sh&lt;BR /&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; pip install poetry&lt;BR /&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; poetry install&amp;nbsp; ---&lt;EM&gt;will install locally cloned package named my_package&lt;/EM&gt;---&lt;/P&gt;&lt;P&gt;Task 1 and 2:&lt;BR /&gt;&amp;nbsp; &amp;nbsp;* depends on Task 0&lt;BR /&gt;&amp;nbsp; &amp;nbsp;* same cluster&lt;BR /&gt;&amp;nbsp; &amp;nbsp;* from my_package import my_class&amp;nbsp; ---&lt;EM&gt;got an exception that thereis no package my_package&lt;/EM&gt;---&lt;/P&gt;&lt;P&gt;Adding the my_package package to the cluster config is not an option, I need to install it first when running the job&lt;/P&gt;</description>
      <pubDate>Thu, 03 Apr 2025 18:59:48 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/package-installation-for-multi-tasks-job/m-p/114452#M44828</guid>
      <dc:creator>Guigui</dc:creator>
      <dc:date>2025-04-03T18:59:48Z</dc:date>
    </item>
    <item>
      <title>Re: Package installation for multi-tasks job</title>
      <link>https://community.databricks.com/t5/data-engineering/package-installation-for-multi-tasks-job/m-p/114459#M44833</link>
      <description>&lt;P&gt;You can install the custom library from volumes/custom(abfss)/workspace path directly on two tasks as part of dependent libraries.&lt;/P&gt;&lt;P&gt;No need to have task0 just to install libraries.&lt;/P&gt;&lt;P&gt;Hope this helps!&amp;nbsp;&lt;span class="lia-unicode-emoji" title=":slightly_smiling_face:"&gt;🙂&lt;/span&gt;&lt;/P&gt;</description>
      <pubDate>Thu, 03 Apr 2025 20:53:20 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/package-installation-for-multi-tasks-job/m-p/114459#M44833</guid>
      <dc:creator>srinum89</dc:creator>
      <dc:date>2025-04-03T20:53:20Z</dc:date>
    </item>
    <item>
      <title>Re: Package installation for multi-tasks job</title>
      <link>https://community.databricks.com/t5/data-engineering/package-installation-for-multi-tasks-job/m-p/114460#M44834</link>
      <description>&lt;P&gt;That what I've done, but I find it less elegant that setup an environment and sharing it across multiple tasks. It seems to be impossible (unless I build a wheel file and I dont want to) as tasks do not share environment, but anyway, as they run in parallel, there is no overhead installing the package for each task.&lt;/P&gt;</description>
      <pubDate>Thu, 03 Apr 2025 20:57:34 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/package-installation-for-multi-tasks-job/m-p/114460#M44834</guid>
      <dc:creator>Guigui</dc:creator>
      <dc:date>2025-04-03T20:57:34Z</dc:date>
    </item>
  </channel>
</rss>

