<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Installing linux packages on cluster in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/installing-linux-packages-on-cluster/m-p/98963#M39872</link>
    <description>&lt;P&gt;Thanks Alberto!&amp;nbsp; There were 42 deb files, so I just changed my script to:&lt;/P&gt;&lt;P&gt;sudo dpkg -i /dbfs/Volumes/your_catalog/your_schema/your_volume/*.deb&lt;/P&gt;&lt;P&gt;The init_script log shows that it unpacks everything, sets them up and the processes triggers, but the package is not actually installed and can not find where it would have been installed to.&amp;nbsp; I do like this alternative if I can figure out how to get it to work.&lt;/P&gt;&lt;P&gt;Thanks,&lt;BR /&gt;Scott&lt;/P&gt;</description>
    <pubDate>Fri, 15 Nov 2024 15:35:51 GMT</pubDate>
    <dc:creator>TX-Aggie-00</dc:creator>
    <dc:date>2024-11-15T15:35:51Z</dc:date>
    <item>
      <title>Installing linux packages on cluster</title>
      <link>https://community.databricks.com/t5/data-engineering/installing-linux-packages-on-cluster/m-p/98725#M39820</link>
      <description>&lt;P&gt;Hey everyone!&amp;nbsp; We have a need to utilize libreoffice in one of our automated tasks via a notebook.&amp;nbsp; I have tried to install via a init script that I attach to the cluster, but sometimes the program gets installed and sometimes it doesn't.&amp;nbsp; For obvious reasons, I need to guarantee that these tasks run successfully.&amp;nbsp; Here is my init script that resides in my workspace:&amp;nbsp; "install_libreoffice.sh":&lt;/P&gt;&lt;LI-CODE lang="python"&gt;#!/bin/bash
echo "----------------INIT SCRIPT---------------"
apt-get update
echo "----------------Installing libreoffice---------------"
apt-get install -y libreoffice
echo "----------------Installing python3-uno---------------"
apt-get install -y python3-uno
echo "----------------Installing poppler-utils---------------"
apt-get install -y poppler-utils
echo "----------INIT SCRIPT COMPLETE------------"&lt;/LI-CODE&gt;&lt;P&gt;When it is successfully installed I can run the following cell and it returns expected results:&lt;/P&gt;&lt;LI-CODE lang="python"&gt;import subprocess
import os

result = subprocess.run(["libreoffice", "--version"], capture_output=True, text=True)
print(result.stdout)&lt;/LI-CODE&gt;&lt;P&gt;It will work and then I will terminate the cluster and start it and it won't work.&amp;nbsp; I have views the init_script logs and everything looks good, but the program will not be installed and %sh ps will not show the process running.&lt;/P&gt;&lt;P&gt;I have tried to install it in a %sh cell, but that doesn't work either.&amp;nbsp; What is the best way to get this consistently installed on a cluster?&lt;/P&gt;&lt;P&gt;Thanks,&lt;BR /&gt;Scott&lt;/P&gt;</description>
      <pubDate>Thu, 14 Nov 2024 00:39:55 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/installing-linux-packages-on-cluster/m-p/98725#M39820</guid>
      <dc:creator>TX-Aggie-00</dc:creator>
      <dc:date>2024-11-14T00:39:55Z</dc:date>
    </item>
    <item>
      <title>Re: Installing linux packages on cluster</title>
      <link>https://community.databricks.com/t5/data-engineering/installing-linux-packages-on-cluster/m-p/98730#M39823</link>
      <description>&lt;P&gt;Forgot to mention a few items of interest:&lt;/P&gt;&lt;UL&gt;&lt;LI&gt;The cluster is a single node, so I would think installing via "%sh" would be sufficient&lt;/LI&gt;&lt;LI&gt;DRV - 12.2 LTS&lt;/LI&gt;&lt;LI&gt;Node - Standard D8s_vs (Azure)&lt;/LI&gt;&lt;/UL&gt;</description>
      <pubDate>Thu, 14 Nov 2024 03:46:33 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/installing-linux-packages-on-cluster/m-p/98730#M39823</guid>
      <dc:creator>TX-Aggie-00</dc:creator>
      <dc:date>2024-11-14T03:46:33Z</dc:date>
    </item>
    <item>
      <title>Re: Installing linux packages on cluster</title>
      <link>https://community.databricks.com/t5/data-engineering/installing-linux-packages-on-cluster/m-p/98826#M39854</link>
      <description>&lt;P&gt;I think I determined the issue, just not sure how best to fix it.&amp;nbsp; It seems the apt-get repository doesn't always work.&amp;nbsp; I noticed that when notebook fails, the init_scripts logs show a lot of 404 errors when downloading the packages.&amp;nbsp; When the workbook is successful, there are no errors and I can see the packages get installed.&lt;/P&gt;&lt;P&gt;I updated the runtime to 15.4 LTS and that seems to be working consistently for now, but I am a bit nervous if this issue will pop up again&lt;/P&gt;</description>
      <pubDate>Thu, 14 Nov 2024 16:10:31 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/installing-linux-packages-on-cluster/m-p/98826#M39854</guid>
      <dc:creator>TX-Aggie-00</dc:creator>
      <dc:date>2024-11-14T16:10:31Z</dc:date>
    </item>
    <item>
      <title>Re: Installing linux packages on cluster</title>
      <link>https://community.databricks.com/t5/data-engineering/installing-linux-packages-on-cluster/m-p/98846#M39856</link>
      <description>&lt;P&gt;Hello&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/132350"&gt;@TX-Aggie-00&lt;/a&gt;,&lt;/P&gt;
&lt;P class="p1"&gt;To ensure that LibreOffice is consistently installed on your Databricks cluster without relying on internet access (which can fail sometimes), you can manually download the necessary packages and store them in a Unity Catalog volume or a workspace location. Here’s a step-by-step guide:&lt;/P&gt;
&lt;OL class="ol1"&gt;
&lt;LI class="li1"&gt;&lt;STRONG&gt;Download the Packages&lt;/STRONG&gt;:&lt;/LI&gt;
&lt;UL class="ul1"&gt;
&lt;LI class="li1"&gt;On a local machine, download the .deb packages for LibreOffice, python3-uno, and poppler-utils from a reliable source such as the official repositories or a trusted mirror.&lt;/LI&gt;
&lt;/UL&gt;
&lt;LI class="li1"&gt;&lt;STRONG&gt;Upload the Packages to Unity Catalog or Workspace&lt;/STRONG&gt;:&lt;/LI&gt;
&lt;UL class="ul1"&gt;
&lt;LI class="li1"&gt;Upload the downloaded .deb files to a Unity Catalog volume or a workspace location (DBFS). You can use the Databricks UI or the Databricks CLI to upload these files. For example, you can use the following CLI command to upload to a Unity Catalog volume:&lt;BR /&gt;&lt;BR /&gt;databricks fs cp local_path_to_deb_file /Volumes/your_catalog/your_schema/your_volume/&lt;BR /&gt;Bash&lt;/LI&gt;
&lt;/UL&gt;
&lt;LI class="li1"&gt;&lt;STRONG&gt;Modify the Init Script&lt;/STRONG&gt;:&lt;/LI&gt;
&lt;UL class="ul1"&gt;
&lt;LI class="li1"&gt;Update your init script to install the packages from the local volume instead of downloading them from the internet. Here’s an example of how your init script might look:&lt;BR /&gt;&lt;BR /&gt;#!/bin/bash&lt;/LI&gt;
&lt;LI class="li1"&gt;echo "----------------INIT SCRIPT---------------"&lt;/LI&gt;
&lt;LI class="li1"&gt;echo "----------------Installing libreoffice---------------"&lt;/LI&gt;
&lt;LI class="li1"&gt;dpkg -i /dbfs/Volumes/your_catalog/your_schema/your_volume/libreoffice.deb&lt;/LI&gt;
&lt;LI class="li1"&gt;echo "----------------Installing python3-uno---------------"&lt;/LI&gt;
&lt;LI class="li1"&gt;dpkg -i /dbfs/Volumes/your_catalog/your_schema/your_volume/python3-uno.deb&lt;/LI&gt;
&lt;LI class="li1"&gt;echo "----------------Installing poppler-utils---------------"&lt;/LI&gt;
&lt;LI class="li1"&gt;dpkg -i /dbfs/Volumes/your_catalog/your_schema/your_volume/poppler-utils.deb&lt;/LI&gt;
&lt;LI class="li1"&gt;echo "----------INIT SCRIPT COMPLETE------------"&lt;/LI&gt;
&lt;/UL&gt;
&lt;/OL&gt;</description>
      <pubDate>Thu, 14 Nov 2024 19:44:11 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/installing-linux-packages-on-cluster/m-p/98846#M39856</guid>
      <dc:creator>Alberto_Umana</dc:creator>
      <dc:date>2024-11-14T19:44:11Z</dc:date>
    </item>
    <item>
      <title>Re: Installing linux packages on cluster</title>
      <link>https://community.databricks.com/t5/data-engineering/installing-linux-packages-on-cluster/m-p/98963#M39872</link>
      <description>&lt;P&gt;Thanks Alberto!&amp;nbsp; There were 42 deb files, so I just changed my script to:&lt;/P&gt;&lt;P&gt;sudo dpkg -i /dbfs/Volumes/your_catalog/your_schema/your_volume/*.deb&lt;/P&gt;&lt;P&gt;The init_script log shows that it unpacks everything, sets them up and the processes triggers, but the package is not actually installed and can not find where it would have been installed to.&amp;nbsp; I do like this alternative if I can figure out how to get it to work.&lt;/P&gt;&lt;P&gt;Thanks,&lt;BR /&gt;Scott&lt;/P&gt;</description>
      <pubDate>Fri, 15 Nov 2024 15:35:51 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/installing-linux-packages-on-cluster/m-p/98963#M39872</guid>
      <dc:creator>TX-Aggie-00</dc:creator>
      <dc:date>2024-11-15T15:35:51Z</dc:date>
    </item>
    <item>
      <title>Re: Installing linux packages on cluster</title>
      <link>https://community.databricks.com/t5/data-engineering/installing-linux-packages-on-cluster/m-p/106131#M42397</link>
      <description>&lt;P&gt;I followed your command and it worked, the only problem is it runs under `libreoffice24.8` command and not `libreoffice`. I ran `which libreoffice24.8` and then create a link: `&lt;/P&gt;&lt;DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;sudo ln -s /usr/local/bin/libreoffice24.8 /usr/local/bin/libreoffice` and it is working now when I use `libreoffice`.&lt;/SPAN&gt;&lt;/DIV&gt;&lt;/DIV&gt;</description>
      <pubDate>Fri, 17 Jan 2025 19:15:42 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/installing-linux-packages-on-cluster/m-p/106131#M42397</guid>
      <dc:creator>virtualdvid2</dc:creator>
      <dc:date>2025-01-17T19:15:42Z</dc:date>
    </item>
    <item>
      <title>Re: Installing linux packages on cluster</title>
      <link>https://community.databricks.com/t5/data-engineering/installing-linux-packages-on-cluster/m-p/106132#M42398</link>
      <description>&lt;P&gt;Thanks for posting your solution!&amp;nbsp; Hopefully it helps someone else with the same issue.&lt;/P&gt;</description>
      <pubDate>Fri, 17 Jan 2025 19:24:39 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/installing-linux-packages-on-cluster/m-p/106132#M42398</guid>
      <dc:creator>TX-Aggie-00</dc:creator>
      <dc:date>2025-01-17T19:24:39Z</dc:date>
    </item>
    <item>
      <title>Re: Installing linux packages on cluster</title>
      <link>https://community.databricks.com/t5/data-engineering/installing-linux-packages-on-cluster/m-p/109011#M43202</link>
      <description>&lt;P&gt;It only works in the driver, when I try to use the whole cluster the nodes can't access the command.&lt;/P&gt;</description>
      <pubDate>Wed, 05 Feb 2025 17:52:38 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/installing-linux-packages-on-cluster/m-p/109011#M43202</guid>
      <dc:creator>virtualdvid</dc:creator>
      <dc:date>2025-02-05T17:52:38Z</dc:date>
    </item>
  </channel>
</rss>

