Alberto_Umana
Databricks Employee
Databricks Employee

Hello @TX-Aggie-00,

To ensure that LibreOffice is consistently installed on your Databricks cluster without relying on internet access (which can fail sometimes), you can manually download the necessary packages and store them in a Unity Catalog volume or a workspace location. Here’s a step-by-step guide:

  1. Download the Packages:
    • On a local machine, download the .deb packages for LibreOffice, python3-uno, and poppler-utils from a reliable source such as the official repositories or a trusted mirror.
  2. Upload the Packages to Unity Catalog or Workspace:
    • Upload the downloaded .deb files to a Unity Catalog volume or a workspace location (DBFS). You can use the Databricks UI or the Databricks CLI to upload these files. For example, you can use the following CLI command to upload to a Unity Catalog volume:

      databricks fs cp local_path_to_deb_file /Volumes/your_catalog/your_schema/your_volume/
      Bash
  3. Modify the Init Script:
    • Update your init script to install the packages from the local volume instead of downloading them from the internet. Here’s an example of how your init script might look:

      #!/bin/bash
    • echo "----------------INIT SCRIPT---------------"
    • echo "----------------Installing libreoffice---------------"
    • dpkg -i /dbfs/Volumes/your_catalog/your_schema/your_volume/libreoffice.deb
    • echo "----------------Installing python3-uno---------------"
    • dpkg -i /dbfs/Volumes/your_catalog/your_schema/your_volume/python3-uno.deb
    • echo "----------------Installing poppler-utils---------------"
    • dpkg -i /dbfs/Volumes/your_catalog/your_schema/your_volume/poppler-utils.deb
    • echo "----------INIT SCRIPT COMPLETE------------"