3 weeks ago
Hey everyone! We have a need to utilize libreoffice in one of our automated tasks via a notebook. I have tried to install via a init script that I attach to the cluster, but sometimes the program gets installed and sometimes it doesn't. For obvious reasons, I need to guarantee that these tasks run successfully. Here is my init script that resides in my workspace: "install_libreoffice.sh":
#!/bin/bash
echo "----------------INIT SCRIPT---------------"
apt-get update
echo "----------------Installing libreoffice---------------"
apt-get install -y libreoffice
echo "----------------Installing python3-uno---------------"
apt-get install -y python3-uno
echo "----------------Installing poppler-utils---------------"
apt-get install -y poppler-utils
echo "----------INIT SCRIPT COMPLETE------------"
When it is successfully installed I can run the following cell and it returns expected results:
import subprocess
import os
result = subprocess.run(["libreoffice", "--version"], capture_output=True, text=True)
print(result.stdout)
It will work and then I will terminate the cluster and start it and it won't work. I have views the init_script logs and everything looks good, but the program will not be installed and %sh ps will not show the process running.
I have tried to install it in a %sh cell, but that doesn't work either. What is the best way to get this consistently installed on a cluster?
Thanks,
Scott
3 weeks ago
Forgot to mention a few items of interest:
3 weeks ago
I think I determined the issue, just not sure how best to fix it. It seems the apt-get repository doesn't always work. I noticed that when notebook fails, the init_scripts logs show a lot of 404 errors when downloading the packages. When the workbook is successful, there are no errors and I can see the packages get installed.
I updated the runtime to 15.4 LTS and that seems to be working consistently for now, but I am a bit nervous if this issue will pop up again
3 weeks ago - last edited 3 weeks ago
Hello @TX-Aggie-00,
To ensure that LibreOffice is consistently installed on your Databricks cluster without relying on internet access (which can fail sometimes), you can manually download the necessary packages and store them in a Unity Catalog volume or a workspace location. Here’s a step-by-step guide:
3 weeks ago
Thanks Alberto! There were 42 deb files, so I just changed my script to:
sudo dpkg -i /dbfs/Volumes/your_catalog/your_schema/your_volume/*.deb
The init_script log shows that it unpacks everything, sets them up and the processes triggers, but the package is not actually installed and can not find where it would have been installed to. I do like this alternative if I can figure out how to get it to work.
Thanks,
Scott
Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.
If there isn’t a group near you, start one and help create a community that brings people together.
Request a New Group