cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Data Engineering
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

Unable to install poppler-utils

Deloitte_DS
New Contributor II

Hi,

I'm trying to install system level package "Poppler-utils" for the cluster. I added the following line to the init.sh script.

sudo apt-get -f -y install poppler-utils

I got the following error: PDFInfoNotInstalledError: Unable to get page count. Is poppler installed and in PATH?

If I install the same line at the notebook level, I don't get this error. 

Can anyone help me with this issue and how to install system level packages at the cluster level in init scripts? 

2 REPLIES 2

Kaniz
Community Manager
Community Manager

Hi @Deloitte_DS , You can use an init script to install system-level packages at the cluster level in Databricks. An init script is a shell script that runs during the startup of each cluster node before the Spark driver or worker JVM starts. You can use init scripts to install packages and libraries not included in the Databricks runtime. The error you are getting suggests that the box is not being found in the PATH. This could be due to how the cluster nodesโ€™ environment variables are set up.

Deloitte_DS
New Contributor II

Hi Kaniz, I tried to include it in the init script but still it is showing the same error. The path I gave is "usr/bin". May I know how I can navigate to this path to check if my package is installed or not? Also want to know how i can navigate to databricks/bin/python? Also how to check the environment variables?

Welcome to Databricks Community: Lets learn, network and celebrate together

Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections. 

Click here to register and join today! 

Engage in exciting technical discussions, join a group with your peers and meet our Featured Members.