<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Issue while reading excel file in qatar region in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/issue-while-reading-excel-file-in-qatar-region/m-p/139153#M51105</link>
    <description>&lt;P&gt;Hello&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/125532"&gt;@Sahil0007&lt;/a&gt;&amp;nbsp;&lt;/P&gt;
&lt;P class="qt3gz91 paragraph"&gt;Thanks for sharing the code and error. This specific error means Spark can’t find the Excel data source on your cluster.&lt;/P&gt;
&lt;H3 class="_7uu25p0 qt3gz9c _7pq7t612 heading3 _7uu25p1"&gt;What the error means&lt;/H3&gt;
&lt;P class="qt3gz91 paragraph"&gt;The message “[DATA_SOURCE_NOT_FOUND] Failed to find the data source: com.crealytics.spark.excel” is raised when the provider isn’t available on the cluster (not installed, incompatible, or not loadable).&lt;/P&gt;
&lt;P class="qt3gz91 paragraph"&gt;Note: I tried to open the Microsoft Databricks error-classes page via our document reader, but Glean had issues fetching that page’s content. I’ve included other sources that show the same error and recommended fixes.&lt;/P&gt;
&lt;H3 class="_7uu25p0 qt3gz9c _7pq7t612 heading3 _7uu25p1"&gt;How to fix it on Databricks&lt;/H3&gt;
&lt;P class="qt3gz91 paragraph"&gt;The most common causes and resolutions:&lt;/P&gt;
&lt;UL class="qt3gz97 qt3gz92"&gt;
&lt;LI class="qt3gz9a"&gt;
&lt;P class="qt3gz91 paragraph"&gt;Install the Excel connector as a JVM/Maven library on the cluster (not with pip). This package is not a Python wheel; it must be installed as a JVM library using Maven coordinates at the cluster level (Compute &amp;gt; your cluster &amp;gt; Libraries &amp;gt; Install new &amp;gt; Maven).&lt;/P&gt;
&lt;/LI&gt;
&lt;LI class="qt3gz9a"&gt;
&lt;P class="qt3gz91 paragraph"&gt;Pick the Maven coordinate that matches your cluster’s Spark and Scala versions. In Databricks you need the artifact with the correct Scala suffix (for example, “_2.12” vs “_2.13”) and a version aligned to your Spark version. The general rule is: choose based on your cluster’s Scala/Spark version in Maven Central when installing the library.&lt;/P&gt;
&lt;DIV class="_7pq7t614 _7pq7t6cl wrz27r2 wrz27r0"&gt;
&lt;DIV class="xh5urp3 xh5urp1 xh5urp0" role="presentation" aria-label="Citation 4"&gt;&amp;nbsp;&lt;/DIV&gt;
&lt;/DIV&gt;
Example that is known to work on Spark 3.5/Scala 2.12 clusters: com.crealytics:spark-excel_2.12:3.5.0_0.20.3.&lt;/LI&gt;
&lt;LI class="qt3gz9a"&gt;
&lt;P class="qt3gz91 paragraph"&gt;If you’re using a Serverless cluster, be aware that installing arbitrary Maven libraries isn’t supported. Use a classic/all-purpose cluster or another supported approach; otherwise you’ll keep getting the “data source not found” error even after attempting install via API.&lt;/P&gt;
&lt;/LI&gt;
&lt;LI class="qt3gz9a"&gt;
&lt;P class="qt3gz91 paragraph"&gt;After installing a new cluster library, restart the cluster so Spark loads it on the driver and executors. (Standard Databricks practice; required for new JVM libs to be visible.)&lt;/P&gt;
&lt;/LI&gt;
&lt;LI class="qt3gz9a"&gt;
&lt;P class="qt3gz91 paragraph"&gt;Use the correct format string for the version you installed:&lt;/P&gt;
&lt;UL class="qt3gz98 qt3gz92"&gt;
&lt;LI class="qt3gz9a"&gt;For the classic com.crealytics package, use format("com.crealytics.spark.excel").&lt;/LI&gt;
&lt;LI class="qt3gz9a"&gt;Some newer releases/forks expose the short name "excel", so format("excel") works as well; this depends on the specific artifact (for example, the dev.mauch fork on newer DBRs).&lt;/LI&gt;
&lt;/UL&gt;
&lt;/LI&gt;
&lt;/UL&gt;
&lt;H3 class="_7uu25p0 qt3gz9c _7pq7t612 heading3 _7uu25p1"&gt;Quick verification steps&lt;/H3&gt;
&lt;P class="qt3gz91 paragraph"&gt;1) Confirm cluster runtime and versions (to select the right coordinate):&lt;/P&gt;
&lt;DIV class="go8b9g1 _7pq7t6cl" data-ui-element="code-block-container"&gt;
&lt;PRE&gt;&lt;CODE class="markdown-code-python qt3gz9e hljs language-python _1ymogdh2"&gt;&lt;SPAN class="hljs-built_in"&gt;print&lt;/SPAN&gt;(&lt;SPAN class="hljs-string"&gt;"Spark:"&lt;/SPAN&gt;, spark.version)
&lt;SPAN class="hljs-built_in"&gt;print&lt;/SPAN&gt;(&lt;SPAN class="hljs-string"&gt;"DBR:"&lt;/SPAN&gt;, spark.conf.get(&lt;SPAN class="hljs-string"&gt;"spark.databricks.clusterUsageTags.sparkVersion"&lt;/SPAN&gt;, &lt;SPAN class="hljs-string"&gt;"n/a"&lt;/SPAN&gt;))&lt;/CODE&gt;&lt;/PRE&gt;
&lt;DIV class="go8b9g3 _7pq7t62y _7pq7t6cm _7pq7t6ay _7pq7t6bo"&gt;
&lt;DIV class="_17yk06p0"&gt;&amp;nbsp;&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;P class="qt3gz91 paragraph"&gt;Then install the Maven coordinate in Compute &amp;gt; Libraries &amp;gt; Install new &amp;gt; Maven; search Maven Central and select the artifact that matches your Scala suffix and Spark version.&lt;/P&gt;
&lt;P class="qt3gz91 paragraph"&gt;2) Restart the cluster.&lt;/P&gt;
&lt;P class="qt3gz91 paragraph"&gt;3) Re-run your code (this is fine as-is):&lt;/P&gt;
&lt;DIV class="go8b9g1 _7pq7t6cl" data-ui-element="code-block-container"&gt;
&lt;PRE&gt;&lt;CODE class="markdown-code-python qt3gz9e hljs language-python _1ymogdh2"&gt;df = (spark.read.&lt;SPAN class="hljs-built_in"&gt;format&lt;/SPAN&gt;(&lt;SPAN class="hljs-string"&gt;"com.crealytics.spark.excel"&lt;/SPAN&gt;)
      .option(&lt;SPAN class="hljs-string"&gt;"header"&lt;/SPAN&gt;, &lt;SPAN class="hljs-string"&gt;"true"&lt;/SPAN&gt;)
      .option(&lt;SPAN class="hljs-string"&gt;"inferSchema"&lt;/SPAN&gt;, &lt;SPAN class="hljs-string"&gt;"true"&lt;/SPAN&gt;)
      .load(&lt;SPAN class="hljs-string"&gt;"abfss://container_name@storage_account.dfs.core.windows.net/dop_testing/PrivilegeSheet.xlsx"&lt;/SPAN&gt;))
df.show(&lt;SPAN class="hljs-number"&gt;5&lt;/SPAN&gt;)&lt;/CODE&gt;&lt;/PRE&gt;
&lt;DIV class="go8b9g3 _7pq7t62y _7pq7t6cm _7pq7t6ay _7pq7t6bo"&gt;
&lt;DIV class="go8b9g5 _7pq7t6cj"&gt;&amp;nbsp;&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;P class="qt3gz91 paragraph"&gt;If you installed a version that registers the short name, you can alternatively try:&lt;/P&gt;
&lt;DIV class="go8b9g1 _7pq7t6cl" data-ui-element="code-block-container"&gt;
&lt;PRE&gt;&lt;CODE class="markdown-code-python qt3gz9e hljs language-python _1ymogdh2"&gt;df = (spark.read.&lt;SPAN class="hljs-built_in"&gt;format&lt;/SPAN&gt;(&lt;SPAN class="hljs-string"&gt;"excel"&lt;/SPAN&gt;)
      .option(&lt;SPAN class="hljs-string"&gt;"header"&lt;/SPAN&gt;, &lt;SPAN class="hljs-string"&gt;"true"&lt;/SPAN&gt;)
      .option(&lt;SPAN class="hljs-string"&gt;"inferSchema"&lt;/SPAN&gt;, &lt;SPAN class="hljs-string"&gt;"true"&lt;/SPAN&gt;)
      .load(&lt;SPAN class="hljs-string"&gt;"abfss://container_name@storage_account.dfs.core.windows.net/dop_testing/PrivilegeSheet.xlsx"&lt;/SPAN&gt;))&lt;/CODE&gt;&lt;/PRE&gt;
&lt;DIV class="go8b9g3 _7pq7t62y _7pq7t6cm _7pq7t6ay _7pq7t6bo"&gt;&amp;nbsp;&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;DIV class="_7pq7t614 _7pq7t6cl wrz27r2 wrz27r0"&gt;
&lt;DIV class="xh5urp3 xh5urp1 xh5urp0" role="presentation" aria-label="Citation 7"&gt;&amp;nbsp;&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;H3 class="_7uu25p0 qt3gz9c _7pq7t612 heading3 _7uu25p1"&gt;Workarounds if you can’t install the library&lt;/H3&gt;
&lt;UL class="qt3gz97 qt3gz92"&gt;
&lt;LI class="qt3gz9a"&gt;Read with pandas on the driver, then convert to Spark:&lt;/LI&gt;
&lt;/UL&gt;
&lt;DIV class="go8b9g1 _7pq7t6cl" data-ui-element="code-block-container"&gt;
&lt;PRE&gt;&lt;CODE class="markdown-code-python qt3gz9e hljs language-python _1ymogdh2"&gt;&lt;SPAN class="hljs-keyword"&gt;import&lt;/SPAN&gt; pandas &lt;SPAN class="hljs-keyword"&gt;as&lt;/SPAN&gt; pd
pdf = pd.read_excel(&lt;SPAN class="hljs-string"&gt;"abfss://container_name@storage_account.dfs.core.windows.net/dop_testing/PrivilegeSheet.xlsx"&lt;/SPAN&gt;)
df = spark.createDataFrame(pdf)&lt;/CODE&gt;&lt;/PRE&gt;
&lt;/DIV&gt;
&lt;P class="qt3gz91 paragraph"&gt;This avoids the JVM data source but is less scalable for very large files.&lt;/P&gt;
&lt;P class="qt3gz91 paragraph"&gt;Cheers, Louis.&lt;/P&gt;</description>
    <pubDate>Sat, 15 Nov 2025 04:09:00 GMT</pubDate>
    <dc:creator>Louis_Frolio</dc:creator>
    <dc:date>2025-11-15T04:09:00Z</dc:date>
    <item>
      <title>Issue while reading excel file in qatar region</title>
      <link>https://community.databricks.com/t5/data-engineering/issue-while-reading-excel-file-in-qatar-region/m-p/138910#M51043</link>
      <description>&lt;P&gt;I have installed excel library version -&amp;nbsp;&lt;A target="_blank"&gt;com.crealytics:spark-excel_2.12:3.5.1_0.20.4.&lt;BR /&gt;&lt;/A&gt;&lt;/P&gt;&lt;P&gt;When I am trying to read it using the below code giving following error -&amp;nbsp;&lt;/P&gt;&lt;P&gt;code :&amp;nbsp;&lt;/P&gt;&lt;DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;df &lt;/SPAN&gt;&lt;SPAN&gt;=&lt;/SPAN&gt;&lt;SPAN&gt; spark.read.&lt;/SPAN&gt;&lt;SPAN&gt;format&lt;/SPAN&gt;&lt;SPAN&gt;(&lt;/SPAN&gt;&lt;SPAN&gt;"com.crealytics.spark.excel"&lt;/SPAN&gt;&lt;SPAN&gt;) \&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;&amp;nbsp; &amp;nbsp; .&lt;/SPAN&gt;&lt;SPAN&gt;option&lt;/SPAN&gt;&lt;SPAN&gt;(&lt;/SPAN&gt;&lt;SPAN&gt;"header"&lt;/SPAN&gt;&lt;SPAN&gt;, &lt;/SPAN&gt;&lt;SPAN&gt;"true"&lt;/SPAN&gt;&lt;SPAN&gt;) \&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;&amp;nbsp; &amp;nbsp; .&lt;/SPAN&gt;&lt;SPAN&gt;option&lt;/SPAN&gt;&lt;SPAN&gt;(&lt;/SPAN&gt;&lt;SPAN&gt;"inferSchema"&lt;/SPAN&gt;&lt;SPAN&gt;, &lt;/SPAN&gt;&lt;SPAN&gt;"true"&lt;/SPAN&gt;&lt;SPAN&gt;) \&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;&amp;nbsp; &amp;nbsp; .&lt;/SPAN&gt;&lt;SPAN&gt;load&lt;/SPAN&gt;&lt;SPAN&gt;(&lt;/SPAN&gt;&lt;SPAN&gt;"abfss://container_name@storage_account.dfs.core.windows.net/dop_testing/PrivilegeSheet.xlsx"&lt;/SPAN&gt;&lt;SPAN&gt;)&lt;BR /&gt;&lt;BR /&gt;Error :&amp;nbsp;&lt;BR /&gt;[&lt;A class="" href="https://learn.microsoft.com/azure/databricks/error-messages/error-classes#data_source_not_found" target="_blank" rel="noopener noreferrer"&gt;DATA_SOURCE_NOT_FOUND&lt;/A&gt;] Failed to find the data source: com.crealytics.spark.excel. Make sure the provider name is correct and the package is properly registered and compatible with your Spark version. SQLSTATE: 42K02&lt;BR /&gt;&lt;/SPAN&gt;&lt;/DIV&gt;&lt;/DIV&gt;</description>
      <pubDate>Thu, 13 Nov 2025 12:02:54 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/issue-while-reading-excel-file-in-qatar-region/m-p/138910#M51043</guid>
      <dc:creator>Sahil0007</dc:creator>
      <dc:date>2025-11-13T12:02:54Z</dc:date>
    </item>
    <item>
      <title>Re: Issue while reading excel file in qatar region</title>
      <link>https://community.databricks.com/t5/data-engineering/issue-while-reading-excel-file-in-qatar-region/m-p/139153#M51105</link>
      <description>&lt;P&gt;Hello&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/125532"&gt;@Sahil0007&lt;/a&gt;&amp;nbsp;&lt;/P&gt;
&lt;P class="qt3gz91 paragraph"&gt;Thanks for sharing the code and error. This specific error means Spark can’t find the Excel data source on your cluster.&lt;/P&gt;
&lt;H3 class="_7uu25p0 qt3gz9c _7pq7t612 heading3 _7uu25p1"&gt;What the error means&lt;/H3&gt;
&lt;P class="qt3gz91 paragraph"&gt;The message “[DATA_SOURCE_NOT_FOUND] Failed to find the data source: com.crealytics.spark.excel” is raised when the provider isn’t available on the cluster (not installed, incompatible, or not loadable).&lt;/P&gt;
&lt;P class="qt3gz91 paragraph"&gt;Note: I tried to open the Microsoft Databricks error-classes page via our document reader, but Glean had issues fetching that page’s content. I’ve included other sources that show the same error and recommended fixes.&lt;/P&gt;
&lt;H3 class="_7uu25p0 qt3gz9c _7pq7t612 heading3 _7uu25p1"&gt;How to fix it on Databricks&lt;/H3&gt;
&lt;P class="qt3gz91 paragraph"&gt;The most common causes and resolutions:&lt;/P&gt;
&lt;UL class="qt3gz97 qt3gz92"&gt;
&lt;LI class="qt3gz9a"&gt;
&lt;P class="qt3gz91 paragraph"&gt;Install the Excel connector as a JVM/Maven library on the cluster (not with pip). This package is not a Python wheel; it must be installed as a JVM library using Maven coordinates at the cluster level (Compute &amp;gt; your cluster &amp;gt; Libraries &amp;gt; Install new &amp;gt; Maven).&lt;/P&gt;
&lt;/LI&gt;
&lt;LI class="qt3gz9a"&gt;
&lt;P class="qt3gz91 paragraph"&gt;Pick the Maven coordinate that matches your cluster’s Spark and Scala versions. In Databricks you need the artifact with the correct Scala suffix (for example, “_2.12” vs “_2.13”) and a version aligned to your Spark version. The general rule is: choose based on your cluster’s Scala/Spark version in Maven Central when installing the library.&lt;/P&gt;
&lt;DIV class="_7pq7t614 _7pq7t6cl wrz27r2 wrz27r0"&gt;
&lt;DIV class="xh5urp3 xh5urp1 xh5urp0" role="presentation" aria-label="Citation 4"&gt;&amp;nbsp;&lt;/DIV&gt;
&lt;/DIV&gt;
Example that is known to work on Spark 3.5/Scala 2.12 clusters: com.crealytics:spark-excel_2.12:3.5.0_0.20.3.&lt;/LI&gt;
&lt;LI class="qt3gz9a"&gt;
&lt;P class="qt3gz91 paragraph"&gt;If you’re using a Serverless cluster, be aware that installing arbitrary Maven libraries isn’t supported. Use a classic/all-purpose cluster or another supported approach; otherwise you’ll keep getting the “data source not found” error even after attempting install via API.&lt;/P&gt;
&lt;/LI&gt;
&lt;LI class="qt3gz9a"&gt;
&lt;P class="qt3gz91 paragraph"&gt;After installing a new cluster library, restart the cluster so Spark loads it on the driver and executors. (Standard Databricks practice; required for new JVM libs to be visible.)&lt;/P&gt;
&lt;/LI&gt;
&lt;LI class="qt3gz9a"&gt;
&lt;P class="qt3gz91 paragraph"&gt;Use the correct format string for the version you installed:&lt;/P&gt;
&lt;UL class="qt3gz98 qt3gz92"&gt;
&lt;LI class="qt3gz9a"&gt;For the classic com.crealytics package, use format("com.crealytics.spark.excel").&lt;/LI&gt;
&lt;LI class="qt3gz9a"&gt;Some newer releases/forks expose the short name "excel", so format("excel") works as well; this depends on the specific artifact (for example, the dev.mauch fork on newer DBRs).&lt;/LI&gt;
&lt;/UL&gt;
&lt;/LI&gt;
&lt;/UL&gt;
&lt;H3 class="_7uu25p0 qt3gz9c _7pq7t612 heading3 _7uu25p1"&gt;Quick verification steps&lt;/H3&gt;
&lt;P class="qt3gz91 paragraph"&gt;1) Confirm cluster runtime and versions (to select the right coordinate):&lt;/P&gt;
&lt;DIV class="go8b9g1 _7pq7t6cl" data-ui-element="code-block-container"&gt;
&lt;PRE&gt;&lt;CODE class="markdown-code-python qt3gz9e hljs language-python _1ymogdh2"&gt;&lt;SPAN class="hljs-built_in"&gt;print&lt;/SPAN&gt;(&lt;SPAN class="hljs-string"&gt;"Spark:"&lt;/SPAN&gt;, spark.version)
&lt;SPAN class="hljs-built_in"&gt;print&lt;/SPAN&gt;(&lt;SPAN class="hljs-string"&gt;"DBR:"&lt;/SPAN&gt;, spark.conf.get(&lt;SPAN class="hljs-string"&gt;"spark.databricks.clusterUsageTags.sparkVersion"&lt;/SPAN&gt;, &lt;SPAN class="hljs-string"&gt;"n/a"&lt;/SPAN&gt;))&lt;/CODE&gt;&lt;/PRE&gt;
&lt;DIV class="go8b9g3 _7pq7t62y _7pq7t6cm _7pq7t6ay _7pq7t6bo"&gt;
&lt;DIV class="_17yk06p0"&gt;&amp;nbsp;&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;P class="qt3gz91 paragraph"&gt;Then install the Maven coordinate in Compute &amp;gt; Libraries &amp;gt; Install new &amp;gt; Maven; search Maven Central and select the artifact that matches your Scala suffix and Spark version.&lt;/P&gt;
&lt;P class="qt3gz91 paragraph"&gt;2) Restart the cluster.&lt;/P&gt;
&lt;P class="qt3gz91 paragraph"&gt;3) Re-run your code (this is fine as-is):&lt;/P&gt;
&lt;DIV class="go8b9g1 _7pq7t6cl" data-ui-element="code-block-container"&gt;
&lt;PRE&gt;&lt;CODE class="markdown-code-python qt3gz9e hljs language-python _1ymogdh2"&gt;df = (spark.read.&lt;SPAN class="hljs-built_in"&gt;format&lt;/SPAN&gt;(&lt;SPAN class="hljs-string"&gt;"com.crealytics.spark.excel"&lt;/SPAN&gt;)
      .option(&lt;SPAN class="hljs-string"&gt;"header"&lt;/SPAN&gt;, &lt;SPAN class="hljs-string"&gt;"true"&lt;/SPAN&gt;)
      .option(&lt;SPAN class="hljs-string"&gt;"inferSchema"&lt;/SPAN&gt;, &lt;SPAN class="hljs-string"&gt;"true"&lt;/SPAN&gt;)
      .load(&lt;SPAN class="hljs-string"&gt;"abfss://container_name@storage_account.dfs.core.windows.net/dop_testing/PrivilegeSheet.xlsx"&lt;/SPAN&gt;))
df.show(&lt;SPAN class="hljs-number"&gt;5&lt;/SPAN&gt;)&lt;/CODE&gt;&lt;/PRE&gt;
&lt;DIV class="go8b9g3 _7pq7t62y _7pq7t6cm _7pq7t6ay _7pq7t6bo"&gt;
&lt;DIV class="go8b9g5 _7pq7t6cj"&gt;&amp;nbsp;&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;P class="qt3gz91 paragraph"&gt;If you installed a version that registers the short name, you can alternatively try:&lt;/P&gt;
&lt;DIV class="go8b9g1 _7pq7t6cl" data-ui-element="code-block-container"&gt;
&lt;PRE&gt;&lt;CODE class="markdown-code-python qt3gz9e hljs language-python _1ymogdh2"&gt;df = (spark.read.&lt;SPAN class="hljs-built_in"&gt;format&lt;/SPAN&gt;(&lt;SPAN class="hljs-string"&gt;"excel"&lt;/SPAN&gt;)
      .option(&lt;SPAN class="hljs-string"&gt;"header"&lt;/SPAN&gt;, &lt;SPAN class="hljs-string"&gt;"true"&lt;/SPAN&gt;)
      .option(&lt;SPAN class="hljs-string"&gt;"inferSchema"&lt;/SPAN&gt;, &lt;SPAN class="hljs-string"&gt;"true"&lt;/SPAN&gt;)
      .load(&lt;SPAN class="hljs-string"&gt;"abfss://container_name@storage_account.dfs.core.windows.net/dop_testing/PrivilegeSheet.xlsx"&lt;/SPAN&gt;))&lt;/CODE&gt;&lt;/PRE&gt;
&lt;DIV class="go8b9g3 _7pq7t62y _7pq7t6cm _7pq7t6ay _7pq7t6bo"&gt;&amp;nbsp;&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;DIV class="_7pq7t614 _7pq7t6cl wrz27r2 wrz27r0"&gt;
&lt;DIV class="xh5urp3 xh5urp1 xh5urp0" role="presentation" aria-label="Citation 7"&gt;&amp;nbsp;&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;H3 class="_7uu25p0 qt3gz9c _7pq7t612 heading3 _7uu25p1"&gt;Workarounds if you can’t install the library&lt;/H3&gt;
&lt;UL class="qt3gz97 qt3gz92"&gt;
&lt;LI class="qt3gz9a"&gt;Read with pandas on the driver, then convert to Spark:&lt;/LI&gt;
&lt;/UL&gt;
&lt;DIV class="go8b9g1 _7pq7t6cl" data-ui-element="code-block-container"&gt;
&lt;PRE&gt;&lt;CODE class="markdown-code-python qt3gz9e hljs language-python _1ymogdh2"&gt;&lt;SPAN class="hljs-keyword"&gt;import&lt;/SPAN&gt; pandas &lt;SPAN class="hljs-keyword"&gt;as&lt;/SPAN&gt; pd
pdf = pd.read_excel(&lt;SPAN class="hljs-string"&gt;"abfss://container_name@storage_account.dfs.core.windows.net/dop_testing/PrivilegeSheet.xlsx"&lt;/SPAN&gt;)
df = spark.createDataFrame(pdf)&lt;/CODE&gt;&lt;/PRE&gt;
&lt;/DIV&gt;
&lt;P class="qt3gz91 paragraph"&gt;This avoids the JVM data source but is less scalable for very large files.&lt;/P&gt;
&lt;P class="qt3gz91 paragraph"&gt;Cheers, Louis.&lt;/P&gt;</description>
      <pubDate>Sat, 15 Nov 2025 04:09:00 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/issue-while-reading-excel-file-in-qatar-region/m-p/139153#M51105</guid>
      <dc:creator>Louis_Frolio</dc:creator>
      <dc:date>2025-11-15T04:09:00Z</dc:date>
    </item>
  </channel>
</rss>

