<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Impossibility to have multiple versions of the same Python package installed in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/impossibility-to-have-multiple-versions-of-the-same-python/m-p/112050#M44088</link>
    <description>&lt;P&gt;Hello,&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;We package our Spark jobs + utilities in a custom package to be used in wheel tasks in Databricks. In my opinion, having several versions of this job (say "production" and "dev") run on the same cluster against &lt;STRONG&gt;different&lt;/STRONG&gt; versions of this custom package is a completely valid requirement to facilitate a somewhat resource-friendly CI/CD workflow.&lt;/P&gt;&lt;P&gt;Alas, Databricks does not allow this since wheel libraries end up being installed&amp;nbsp;&lt;STRONG&gt;cluster-wide&lt;/STRONG&gt; and only one version of the same library is allowed at a time. To make matter more inconvenient - the cluster needs to be re-started to uninstall a library.&lt;/P&gt;&lt;P&gt;Since we cannot be the only team facing this issue my question is: how to circumvent this shortcoming. Rolling everything into one script - ugly. Notebooks - not an option either.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Thank you,David&lt;/P&gt;</description>
    <pubDate>Fri, 07 Mar 2025 21:21:40 GMT</pubDate>
    <dc:creator>the_dude</dc:creator>
    <dc:date>2025-03-07T21:21:40Z</dc:date>
    <item>
      <title>Impossibility to have multiple versions of the same Python package installed</title>
      <link>https://community.databricks.com/t5/data-engineering/impossibility-to-have-multiple-versions-of-the-same-python/m-p/112050#M44088</link>
      <description>&lt;P&gt;Hello,&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;We package our Spark jobs + utilities in a custom package to be used in wheel tasks in Databricks. In my opinion, having several versions of this job (say "production" and "dev") run on the same cluster against &lt;STRONG&gt;different&lt;/STRONG&gt; versions of this custom package is a completely valid requirement to facilitate a somewhat resource-friendly CI/CD workflow.&lt;/P&gt;&lt;P&gt;Alas, Databricks does not allow this since wheel libraries end up being installed&amp;nbsp;&lt;STRONG&gt;cluster-wide&lt;/STRONG&gt; and only one version of the same library is allowed at a time. To make matter more inconvenient - the cluster needs to be re-started to uninstall a library.&lt;/P&gt;&lt;P&gt;Since we cannot be the only team facing this issue my question is: how to circumvent this shortcoming. Rolling everything into one script - ugly. Notebooks - not an option either.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Thank you,David&lt;/P&gt;</description>
      <pubDate>Fri, 07 Mar 2025 21:21:40 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/impossibility-to-have-multiple-versions-of-the-same-python/m-p/112050#M44088</guid>
      <dc:creator>the_dude</dc:creator>
      <dc:date>2025-03-07T21:21:40Z</dc:date>
    </item>
    <item>
      <title>Re: Impossibility to have multiple versions of the same Python package installed</title>
      <link>https://community.databricks.com/t5/data-engineering/impossibility-to-have-multiple-versions-of-the-same-python/m-p/112180#M44127</link>
      <description>&lt;P&gt;If someone comes across this post - as per &lt;A href="https://learn.microsoft.com/en-us/azure/databricks/libraries/#notebook-scoped-libraries" target="_self"&gt;documentation&lt;/A&gt;, library/package installation can be Notebook-scoped. Thus, in order to overcome the limitation described in the initial post instead we are experimenting with Notebook tasks whose only responsibility it is to install the custom library using %pip install followed by a call to main() of module which contains the actual processing logic.&lt;BR /&gt;&lt;BR /&gt;I am surprised that running PySpark jobs packaged as .whl in &lt;STRONG&gt;isolation&lt;/STRONG&gt; is not something that Databricks provides out of the box. Ways to do so via for instance packaged virtual environments are described in PySpark's documentation and I would have expected Databricks to handle .whl tasks in such a way without the user having to worry about one job interfering with another.&lt;/P&gt;&lt;P&gt;Regards,&lt;/P&gt;&lt;P&gt;David&lt;/P&gt;</description>
      <pubDate>Mon, 10 Mar 2025 16:17:50 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/impossibility-to-have-multiple-versions-of-the-same-python/m-p/112180#M44127</guid>
      <dc:creator>the_dude</dc:creator>
      <dc:date>2025-03-10T16:17:50Z</dc:date>
    </item>
  </channel>
</rss>

