cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Installing linux packages on cluster

TX-Aggie-00
Visitor

Hey everyone!  We have a need to utilize libreoffice in one of our automated tasks via a notebook.  I have tried to install via a init script that I attach to the cluster, but sometimes the program gets installed and sometimes it doesn't.  For obvious reasons, I need to guarantee that these tasks run successfully.  Here is my init script that resides in my workspace:  "install_libreoffice.sh":

#!/bin/bash
echo "----------------INIT SCRIPT---------------"
apt-get update
echo "----------------Installing libreoffice---------------"
apt-get install -y libreoffice
echo "----------------Installing python3-uno---------------"
apt-get install -y python3-uno
echo "----------------Installing poppler-utils---------------"
apt-get install -y poppler-utils
echo "----------INIT SCRIPT COMPLETE------------"

When it is successfully installed I can run the following cell and it returns expected results:

import subprocess
import os

result = subprocess.run(["libreoffice", "--version"], capture_output=True, text=True)
print(result.stdout)

It will work and then I will terminate the cluster and start it and it won't work.  I have views the init_script logs and everything looks good, but the program will not be installed and %sh ps will not show the process running.

I have tried to install it in a %sh cell, but that doesn't work either.  What is the best way to get this consistently installed on a cluster?

Thanks,
Scott

1 REPLY 1

TX-Aggie-00
Visitor

Forgot to mention a few items of interest:

  • The cluster is a single node, so I would think installing via "%sh" would be sufficient
  • DRV - 12.2 LTS
  • Node - Standard D8s_vs (Azure)

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group