<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Library installation in cluster taking a long time in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/library-installation-in-cluster-taking-a-long-time/m-p/9147#M4618</link>
    <description>&lt;P&gt;I am trying to install "pycaret" libraray in cluster using whl file.&lt;/P&gt;&lt;P&gt;But it is creating conflict in the dependency sometimes (not always, sometimes it works too.) &lt;/P&gt;&lt;P&gt;​&lt;/P&gt;&lt;P&gt;My questions are -&lt;/P&gt;&lt;P&gt;&lt;B&gt;1 - How to install libraries in cluster only single time (Maybe from cache). Because it downloads and install them everytime I start the cluster.&lt;/B&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;It takes around 20 minutes to install this.&lt;/P&gt;&lt;P&gt;​&lt;/P&gt;&lt;P&gt;&lt;B&gt;​2 - How to solve the dependency error and why it is not replicated always?&lt;/B&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;This might be due to change in numpy version because default runtime has &lt;B&gt;1.21.5&lt;/B&gt; and after the library installation it changes to &lt;B&gt;1.19.5&lt;/B&gt; (sometimes).&lt;/P&gt;&lt;P&gt;And the error That i get is&lt;/P&gt;&lt;P&gt;"&lt;B&gt;ValueError: numpy.ndarray size changed, may indicate binary incompatibility. Expected 88 from C header, got 80 from PyObject&lt;/B&gt;"&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Another issue that follows when above gets resolved (suprisingly) is &lt;/P&gt;&lt;P&gt;'&lt;B&gt;ImportError: Numba needs NumPy 1.20 or less&lt;/B&gt;' which also get reolved after I re-run the cell. &lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Can someone please help??&lt;/P&gt;&lt;P&gt;&lt;/P&gt;</description>
    <pubDate>Fri, 17 Feb 2023 14:26:26 GMT</pubDate>
    <dc:creator>AyushModi038</dc:creator>
    <dc:date>2023-02-17T14:26:26Z</dc:date>
    <item>
      <title>Library installation in cluster taking a long time</title>
      <link>https://community.databricks.com/t5/data-engineering/library-installation-in-cluster-taking-a-long-time/m-p/9147#M4618</link>
      <description>&lt;P&gt;I am trying to install "pycaret" libraray in cluster using whl file.&lt;/P&gt;&lt;P&gt;But it is creating conflict in the dependency sometimes (not always, sometimes it works too.) &lt;/P&gt;&lt;P&gt;​&lt;/P&gt;&lt;P&gt;My questions are -&lt;/P&gt;&lt;P&gt;&lt;B&gt;1 - How to install libraries in cluster only single time (Maybe from cache). Because it downloads and install them everytime I start the cluster.&lt;/B&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;It takes around 20 minutes to install this.&lt;/P&gt;&lt;P&gt;​&lt;/P&gt;&lt;P&gt;&lt;B&gt;​2 - How to solve the dependency error and why it is not replicated always?&lt;/B&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;This might be due to change in numpy version because default runtime has &lt;B&gt;1.21.5&lt;/B&gt; and after the library installation it changes to &lt;B&gt;1.19.5&lt;/B&gt; (sometimes).&lt;/P&gt;&lt;P&gt;And the error That i get is&lt;/P&gt;&lt;P&gt;"&lt;B&gt;ValueError: numpy.ndarray size changed, may indicate binary incompatibility. Expected 88 from C header, got 80 from PyObject&lt;/B&gt;"&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Another issue that follows when above gets resolved (suprisingly) is &lt;/P&gt;&lt;P&gt;'&lt;B&gt;ImportError: Numba needs NumPy 1.20 or less&lt;/B&gt;' which also get reolved after I re-run the cell. &lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Can someone please help??&lt;/P&gt;&lt;P&gt;&lt;/P&gt;</description>
      <pubDate>Fri, 17 Feb 2023 14:26:26 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/library-installation-in-cluster-taking-a-long-time/m-p/9147#M4618</guid>
      <dc:creator>AyushModi038</dc:creator>
      <dc:date>2023-02-17T14:26:26Z</dc:date>
    </item>
    <item>
      <title>Re: Library installation in cluster taking a long time</title>
      <link>https://community.databricks.com/t5/data-engineering/library-installation-in-cluster-taking-a-long-time/m-p/9148#M4619</link>
      <description>&lt;P&gt;Hi,&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Which DBR version are you using? are you installing the library using an init script or once the cluster is up and running, you install it? do you see any error message while trying to install the library? check the driver logs.&lt;/P&gt;</description>
      <pubDate>Wed, 22 Feb 2023 22:02:21 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/library-installation-in-cluster-taking-a-long-time/m-p/9148#M4619</guid>
      <dc:creator>jose_gonzalez</dc:creator>
      <dc:date>2023-02-22T22:02:21Z</dc:date>
    </item>
    <item>
      <title>Re: Library installation in cluster taking a long time</title>
      <link>https://community.databricks.com/t5/data-engineering/library-installation-in-cluster-taking-a-long-time/m-p/9149#M4620</link>
      <description>&lt;P&gt;Hi Jose,&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Thanks for the help.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Here are th requested details -&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;DBR Version - &lt;B&gt;12.1 ML (includes Apache Spark 3.3.1, Scala 2.12)&lt;/B&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Installation Mode - Using cluster UI page -&amp;gt; Libarary Tab -&amp;gt; &lt;B&gt;Install New&lt;/B&gt;. It installs the libarary everytime the cluster starts.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Error Messages in Driver Logs - &lt;/P&gt;&lt;P&gt;&lt;B&gt;ValueError: numpy.ndarray size changed, may indicate binary incompatibility. Expected 88 from C header, got 80 from PyObject&lt;/B&gt;&lt;/P&gt;&lt;P&gt;It is due to numpy version mismatch&lt;/P&gt;</description>
      <pubDate>Mon, 27 Feb 2023 09:53:07 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/library-installation-in-cluster-taking-a-long-time/m-p/9149#M4620</guid>
      <dc:creator>AyushModi038</dc:creator>
      <dc:date>2023-02-27T09:53:07Z</dc:date>
    </item>
    <item>
      <title>Re: Library installation in cluster taking a long time</title>
      <link>https://community.databricks.com/t5/data-engineering/library-installation-in-cluster-taking-a-long-time/m-p/9150#M4621</link>
      <description>&lt;P&gt;Hi @Ayush Modi​&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Thank you for posting your question in our community! We are happy to assist you.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;To help us provide you with the most accurate information, could you please take a moment to review the responses and select the one that best answers your question?&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;This will also help other community members who may have similar questions in the future. &lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Thank you for your participation and let us know if you need any further assistance!&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;</description>
      <pubDate>Sat, 11 Mar 2023 03:16:41 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/library-installation-in-cluster-taking-a-long-time/m-p/9150#M4621</guid>
      <dc:creator>Anonymous</dc:creator>
      <dc:date>2023-03-11T03:16:41Z</dc:date>
    </item>
    <item>
      <title>Re: Library installation in cluster taking a long time</title>
      <link>https://community.databricks.com/t5/data-engineering/library-installation-in-cluster-taking-a-long-time/m-p/48214#M28260</link>
      <description>&lt;P&gt;I'm having the exact same problem and its causing issues when I run workflows, too. Please advise, databricks.&lt;/P&gt;</description>
      <pubDate>Wed, 04 Oct 2023 16:59:53 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/library-installation-in-cluster-taking-a-long-time/m-p/48214#M28260</guid>
      <dc:creator>efry</dc:creator>
      <dc:date>2023-10-04T16:59:53Z</dc:date>
    </item>
    <item>
      <title>Re: Library installation in cluster taking a long time</title>
      <link>https://community.databricks.com/t5/data-engineering/library-installation-in-cluster-taking-a-long-time/m-p/71639#M34364</link>
      <description>&lt;P&gt;Can any Databricks pros provide some guidance on this? My clusters that have "cluster-installed" libraries take 30 minutes or more to become usable. I'm only trying to install a handful of CRAN libraries, but having to re-install them every time a cluster starts up is SO painful.&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Tue, 04 Jun 2024 17:27:09 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/library-installation-in-cluster-taking-a-long-time/m-p/71639#M34364</guid>
      <dc:creator>Spencer_Kent</dc:creator>
      <dc:date>2024-06-04T17:27:09Z</dc:date>
    </item>
    <item>
      <title>Re: Library installation in cluster taking a long time</title>
      <link>https://community.databricks.com/t5/data-engineering/library-installation-in-cluster-taking-a-long-time/m-p/75856#M35082</link>
      <description>&lt;P&gt;I am experiencing a similar issue where a few libraries take 15 minutes to install when running a workflow. Could you please advise if there is a solution for this?&lt;/P&gt;</description>
      <pubDate>Wed, 26 Jun 2024 14:03:17 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/library-installation-in-cluster-taking-a-long-time/m-p/75856#M35082</guid>
      <dc:creator>shirlyb-melio</dc:creator>
      <dc:date>2024-06-26T14:03:17Z</dc:date>
    </item>
    <item>
      <title>Re: Library installation in cluster taking a long time</title>
      <link>https://community.databricks.com/t5/data-engineering/library-installation-in-cluster-taking-a-long-time/m-p/76919#M35361</link>
      <description>&lt;P&gt;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/9"&gt;@Retired_mod&lt;/a&gt;What about question #1, which is what subsequent comments to this thread have been referring to? To recap the question: is it possible for "cluster-installed" libraries to be cached in such a way that they aren't completely reinstalled every time the cluster is started?&lt;/P&gt;</description>
      <pubDate>Fri, 05 Jul 2024 17:22:18 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/library-installation-in-cluster-taking-a-long-time/m-p/76919#M35361</guid>
      <dc:creator>Spencer_Kent</dc:creator>
      <dc:date>2024-07-05T17:22:18Z</dc:date>
    </item>
    <item>
      <title>Re: Library installation in cluster taking a long time</title>
      <link>https://community.databricks.com/t5/data-engineering/library-installation-in-cluster-taking-a-long-time/m-p/112631#M44276</link>
      <description>&lt;P&gt;There is a way to bypass this problem.&lt;/P&gt;&lt;UL&gt;&lt;LI&gt;create a databricks repo and include a folder called "Modules"&lt;/LI&gt;&lt;LI&gt;download and unpack the whl/tar.gz to a folder (best practice, name the folder as the module itself) of the python module you'd like to install&lt;/LI&gt;&lt;LI&gt;push that folder inside Modules folder in the databricks repo&lt;/LI&gt;&lt;LI&gt;in your notebook, where you need to use that module, in the first cell add this -&amp;gt; sys.path.append("&amp;lt;full-path-to-Modules&amp;gt;").&lt;/LI&gt;&lt;LI&gt;Import your module as usual..&lt;/LI&gt;&lt;/UL&gt;&lt;P&gt;In that way, you wouldn't have to install the wheel every time your cluster starts..&lt;/P&gt;</description>
      <pubDate>Fri, 14 Mar 2025 18:27:07 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/library-installation-in-cluster-taking-a-long-time/m-p/112631#M44276</guid>
      <dc:creator>fifata</dc:creator>
      <dc:date>2025-03-14T18:27:07Z</dc:date>
    </item>
  </channel>
</rss>

