<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Getting Errors when reading data from Excel InternalError: pip is not installed for /local_disk in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/getting-errors-when-reading-data-from-excel-internalerror-pip-is/m-p/111550#M43934</link>
    <description>&lt;P&gt;Hi all,&lt;/P&gt;&lt;P&gt;We have a daily Databricks job that downloads excel files from SharePoint and read them, the job works fine until today (3March). We are getting the following error message when running the code to read the excel file:&lt;/P&gt;&lt;LI-CODE lang="markup"&gt;org.apache.spark.SparkException: Job aborted due to stage failure: Task 1 in stage 4984.0 failed 4 times, most recent failure: Lost task 1.3 in stage 4984.0 (TID 210941, 10.249.215.10, executor 2): org.apache.spark.SparkException: InternalError: pip is not installed for /local_disk0/spark-5c862e06-01f9-45b9-9e19-e3b66da55ba5/executor-e8eee9ca-9b55-452a-a841-338ce12461be/pythonVirtualEnvDirs/virtualEnv-1cf2ae47-3738-434e-9355-02a97960ebde&lt;/LI-CODE&gt;&lt;P&gt;We have two code block that run in sequence:&lt;/P&gt;&lt;LI-CODE lang="markup"&gt;dbutils.library.installPyPI("Office365-REST-Python-Client",version="2.4.4")

#########################
some code to download excel from sharepoint
########################&lt;/LI-CODE&gt;&lt;LI-CODE lang="markup"&gt;sparkDF = spark.read.format("com.crealytics.spark.excel").option("header", "true").option("inferSchema", "true").load(file_name)&lt;/LI-CODE&gt;&lt;P&gt;We get error when running the second code block. I tried to comment out the installPyPI code line and the error is gone. I think the error is related to the install library action, but don't know why it didn't fail when doing it but after it.&lt;/P&gt;&lt;P&gt;Could someone clarify for us? Thanks in advance.&lt;/P&gt;</description>
    <pubDate>Mon, 03 Mar 2025 09:16:40 GMT</pubDate>
    <dc:creator>Brianben</dc:creator>
    <dc:date>2025-03-03T09:16:40Z</dc:date>
    <item>
      <title>Getting Errors when reading data from Excel InternalError: pip is not installed for /local_disk</title>
      <link>https://community.databricks.com/t5/data-engineering/getting-errors-when-reading-data-from-excel-internalerror-pip-is/m-p/111550#M43934</link>
      <description>&lt;P&gt;Hi all,&lt;/P&gt;&lt;P&gt;We have a daily Databricks job that downloads excel files from SharePoint and read them, the job works fine until today (3March). We are getting the following error message when running the code to read the excel file:&lt;/P&gt;&lt;LI-CODE lang="markup"&gt;org.apache.spark.SparkException: Job aborted due to stage failure: Task 1 in stage 4984.0 failed 4 times, most recent failure: Lost task 1.3 in stage 4984.0 (TID 210941, 10.249.215.10, executor 2): org.apache.spark.SparkException: InternalError: pip is not installed for /local_disk0/spark-5c862e06-01f9-45b9-9e19-e3b66da55ba5/executor-e8eee9ca-9b55-452a-a841-338ce12461be/pythonVirtualEnvDirs/virtualEnv-1cf2ae47-3738-434e-9355-02a97960ebde&lt;/LI-CODE&gt;&lt;P&gt;We have two code block that run in sequence:&lt;/P&gt;&lt;LI-CODE lang="markup"&gt;dbutils.library.installPyPI("Office365-REST-Python-Client",version="2.4.4")

#########################
some code to download excel from sharepoint
########################&lt;/LI-CODE&gt;&lt;LI-CODE lang="markup"&gt;sparkDF = spark.read.format("com.crealytics.spark.excel").option("header", "true").option("inferSchema", "true").load(file_name)&lt;/LI-CODE&gt;&lt;P&gt;We get error when running the second code block. I tried to comment out the installPyPI code line and the error is gone. I think the error is related to the install library action, but don't know why it didn't fail when doing it but after it.&lt;/P&gt;&lt;P&gt;Could someone clarify for us? Thanks in advance.&lt;/P&gt;</description>
      <pubDate>Mon, 03 Mar 2025 09:16:40 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/getting-errors-when-reading-data-from-excel-internalerror-pip-is/m-p/111550#M43934</guid>
      <dc:creator>Brianben</dc:creator>
      <dc:date>2025-03-03T09:16:40Z</dc:date>
    </item>
    <item>
      <title>Re: Getting Errors when reading data from Excel InternalError: pip is not installed for /local_disk</title>
      <link>https://community.databricks.com/t5/data-engineering/getting-errors-when-reading-data-from-excel-internalerror-pip-is/m-p/112106#M44109</link>
      <description>&lt;P&gt;I think the issue comes from installing Office365-REST-Python-Client using dbutils.library.installPyPI, which seems to create a conflicting Python environment for Spark executors. Since notebook specific installs modify the environment dynamically, the executors and driver end up out of sync, leading to errors. A&lt;SPAN&gt;&amp;nbsp;better approach is to install the library at the cluster level using the Databricks UI or an init script, so everything runs in a stable, shared environment.&lt;/SPAN&gt;&lt;/P&gt;</description>
      <pubDate>Sun, 09 Mar 2025 17:07:54 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/getting-errors-when-reading-data-from-excel-internalerror-pip-is/m-p/112106#M44109</guid>
      <dc:creator>Renu_</dc:creator>
      <dc:date>2025-03-09T17:07:54Z</dc:date>
    </item>
  </channel>
</rss>

