I'm trying to extract the text data from image file in Databricks notebook I have installed below libraries using pip command: %pip install pytesseract tesseract pillow --upgrade
but it didn't work and threw below error pytesseract.pytesseract.TesseractNotFoundError: tesseract is not installed or it's not in your PATH. See README file for more information.
I then installed below the libraries using the libraries section of cluster in Databricks:
- pillow
- pytesseract
- tesseract
But this didn't work too.
later i ran the below shell command in Databricks notebook cell:
%sh
apt-get install -y tesseract-ocr
This command gave me below error: E: Could not open lock file /var/lib/dpkg/lock-frontend - open (13: Permission denied) E: Unable to acquire the dpkg frontend lock (/var/lib/dpkg/lock-frontend), are you root?
Here is my code which i want to run in my databricks notebook:
img=img_path
img_gray = img.convert('L')
text = pytesseract.image_to_string(img_gray)
I want the code to extract the textual data accurately from images Please let me know where am i doing mistake?