<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic pytesseract.pytesseract.TesseractNotFoundError in databricks notebook in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/pytesseract-pytesseract-tesseractnotfounderror-in-databricks/m-p/55599#M30381</link>
    <description>&lt;P&gt;I'm trying to extract the text data from image file in Databricks notebook I have installed below libraries using pip command: %pip install pytesseract tesseract pillow --upgrade&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;but it didn't work and threw below error pytesseract.pytesseract.TesseractNotFoundError: tesseract is not installed or it's not in your PATH. See README file for more information.&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;I then installed below the libraries using the libraries section of cluster in Databricks:&lt;/P&gt;&lt;UL&gt;&lt;LI&gt;pillow&lt;/LI&gt;&lt;LI&gt;pytesseract&lt;/LI&gt;&lt;LI&gt;tesseract&lt;/LI&gt;&lt;/UL&gt;&lt;P&gt;But this didn't work too.&lt;/P&gt;&lt;P&gt;later i ran the below shell command in Databricks notebook cell:&lt;/P&gt;&lt;P&gt;%sh&lt;/P&gt;&lt;P&gt;apt-get install -y tesseract-ocr&lt;/P&gt;&lt;P&gt;This command gave me below error: E: Could not open lock file /var/lib/dpkg/lock-frontend - open (13: Permission denied) E: Unable to acquire the dpkg frontend lock (/var/lib/dpkg/lock-frontend), are you root?&lt;/P&gt;&lt;P&gt;Here is my code which i want to run in my databricks notebook:&lt;/P&gt;&lt;P&gt;img=img_path&lt;/P&gt;&lt;P&gt;img_gray = img.convert('L')&lt;/P&gt;&lt;P&gt;text = pytesseract.image_to_string(img_gray)&lt;/P&gt;&lt;P&gt;I want the code to extract the textual data accurately from images Please let me know where am i doing mistake?&lt;/P&gt;</description>
    <pubDate>Thu, 21 Dec 2023 13:07:53 GMT</pubDate>
    <dc:creator>neha_ayodhya</dc:creator>
    <dc:date>2023-12-21T13:07:53Z</dc:date>
    <item>
      <title>pytesseract.pytesseract.TesseractNotFoundError in databricks notebook</title>
      <link>https://community.databricks.com/t5/data-engineering/pytesseract-pytesseract-tesseractnotfounderror-in-databricks/m-p/55599#M30381</link>
      <description>&lt;P&gt;I'm trying to extract the text data from image file in Databricks notebook I have installed below libraries using pip command: %pip install pytesseract tesseract pillow --upgrade&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;but it didn't work and threw below error pytesseract.pytesseract.TesseractNotFoundError: tesseract is not installed or it's not in your PATH. See README file for more information.&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;I then installed below the libraries using the libraries section of cluster in Databricks:&lt;/P&gt;&lt;UL&gt;&lt;LI&gt;pillow&lt;/LI&gt;&lt;LI&gt;pytesseract&lt;/LI&gt;&lt;LI&gt;tesseract&lt;/LI&gt;&lt;/UL&gt;&lt;P&gt;But this didn't work too.&lt;/P&gt;&lt;P&gt;later i ran the below shell command in Databricks notebook cell:&lt;/P&gt;&lt;P&gt;%sh&lt;/P&gt;&lt;P&gt;apt-get install -y tesseract-ocr&lt;/P&gt;&lt;P&gt;This command gave me below error: E: Could not open lock file /var/lib/dpkg/lock-frontend - open (13: Permission denied) E: Unable to acquire the dpkg frontend lock (/var/lib/dpkg/lock-frontend), are you root?&lt;/P&gt;&lt;P&gt;Here is my code which i want to run in my databricks notebook:&lt;/P&gt;&lt;P&gt;img=img_path&lt;/P&gt;&lt;P&gt;img_gray = img.convert('L')&lt;/P&gt;&lt;P&gt;text = pytesseract.image_to_string(img_gray)&lt;/P&gt;&lt;P&gt;I want the code to extract the textual data accurately from images Please let me know where am i doing mistake?&lt;/P&gt;</description>
      <pubDate>Thu, 21 Dec 2023 13:07:53 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/pytesseract-pytesseract-tesseractnotfounderror-in-databricks/m-p/55599#M30381</guid>
      <dc:creator>neha_ayodhya</dc:creator>
      <dc:date>2023-12-21T13:07:53Z</dc:date>
    </item>
    <item>
      <title>Re: pytesseract.pytesseract.TesseractNotFoundError in databricks notebook</title>
      <link>https://community.databricks.com/t5/data-engineering/pytesseract-pytesseract-tesseractnotfounderror-in-databricks/m-p/56995#M30709</link>
      <description>&lt;P&gt;Hi&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/96738"&gt;@neha_ayodhya&lt;/a&gt;&amp;nbsp; - can you please try the following via an init script to the Databricks cluster&lt;/P&gt;
&lt;LI-CODE lang="markup"&gt;sudo apt-get update -y
sudo apt-get install -y tesseract-ocr
sudo apt-get install -y libtesseract-dev
/databricks/python/bin/pip install pytesseract
&lt;/LI-CODE&gt;
&lt;P&gt;&amp;nbsp;and let us know.&lt;/P&gt;
&lt;P&gt;Thanks, Shan&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Thu, 11 Jan 2024 21:27:13 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/pytesseract-pytesseract-tesseractnotfounderror-in-databricks/m-p/56995#M30709</guid>
      <dc:creator>shan_chandra</dc:creator>
      <dc:date>2024-01-11T21:27:13Z</dc:date>
    </item>
  </channel>
</rss>

