cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

TesseractNotFoundError

feed
New Contributor III

TesseractNotFoundError: tesseract is not installed or it's not in your PATH. See README file for more information. in databricks

7 REPLIES 7

feed
New Contributor III

To install Tesseract on your Databricks cluster, you can use the following command

%sh apt-get install -y tesseract-ocr

After installing Tesseract, you need to add the path to the Tesseract executable file to your PATH environment variable. To do this, you can run the following command in a Databricks notebook:

%sh echo 'export PATH=/usr/bin:$PATH' >> ~/.bashrc && source ~/.bashrc

This command adds the path to the Tesseract executable file to your PATH environment variable and makes it accessible to your Databricks notebook.

Check if Tesseract OCR is installed on your Databricks cluster. You can do this by running the following command in a Databricks notebook:

%sh which tesseract

After following these steps, you should be able to use pytesseract in your Databricks notebook without encountering the "TesseractNotFoundError" error.

NandiniN
Honored Contributor

Hello @feed expeditionโ€‹ 

You can also try this -

  1. Create a new cluster or select an existing one in Databricks.
  2. In the "Libraries" tab of the cluster settings, click on "Install New" and select "PyPI".
  3. In the "Package" field, enter "pytesseract".
  4. Click on "Install" and wait for the installation to complete.

Thanks & Regards,

Nandini

feed
New Contributor III

Yes ofcourse This is fine incase if you need install Python Library pytesseract

But if you need extract text from image You should install Tesseract OCR in working Cluster

Otherwise it will give this error

NandiniN
Honored Contributor

Ack. Thank you for sharing!

Anonymous
Not applicable

Hi @feed expeditionโ€‹ 

Hope all is well! Just wanted to check in if you were able to resolve your issue and would you be happy to share the solution or mark an answer as best? Else please let us know if you need more help. 

We'd love to hear from you.

Thanks!

neha_ayodhya
New Contributor II

%sh apt-get install -y tesseract-ocr this command is not working in my new Databricks free trail account, earlier it worked fine in my old Databricks instance. I get below error: E: Could not open lock file /var/lib/dpkg/lock-frontend - open (13: Permission denied) E: Unable to acquire the dpkg frontend lock (/var/lib/dpkg/lock-frontend), are you root? I have installed pytesseract and tesseract both from libraries section of cluster as well as from pip install command in the notebook, but even after doing all the steps i get TesseractNotFoundError Please let me know if anyone can help me

Hi @neha_ayodhya, In Databricks, you might not have the necessary permissions to run the apt-get install command.

 

However, you can try the following steps to resolve the TesseractNotFoundError:

 

  • Install Tesseract on your Databricks cluster using the following command:

%sh apt-get install -y tesseract-ocr

  • Add the path to the Tesseract executable file to your PATH environment variable. You can do this by running the following command in a Databricks notebook:

%sh echo 'export PATH=/usr/bin:$PATH' >> ~/.bashrc && source ~/.bashrc

  • Check if Tesseract OCR is installed on your Databricks cluster by running the following command in a Databricks Notebook:

%sh which tesseract

After following these steps, you should be able to use pytesseract in your Databricks notebook witho....

 

If you still encounter issues, you can try installing Tesseract via an init script to the Databricks...

 

Here are the commands you can use in the init script:

 

sudo apt-get update -y

sudo apt-get install -y tesseract-ocr

sudo apt-get install -y libtesseract-dev /databricks/python/bin/pip install pytesseract

Please note that these commands need to be run as root, which is why theyโ€™re included in an init scr....

 

I hope this helps! Let me know if you have any other questions.

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you wonโ€™t want to miss the chance to attend and share knowledge.

If there isnโ€™t a group near you, start one and help create a community that brings people together.

Request a New Group