Unable to read excel files in the Azure databricks (UC enabled workspace)
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
2 weeks ago
Hello,
After adding the maven library com.crealytics:spark-excel_2.12:0.13.5 under the artifact allowlist, I have installed it at the Azure databricks cluster level (shared, unity catalog enabled, runtime 15.4). Then I tried to create a df for the excel file as follows,
df = spark.read.format('com.crealytics.spark.excel').option('header',True).load('abfss://dev@dlake.dfs.core.windows.net/myfiles/file1.xlsx')
And while trying to display the above df, or trying to see the columns using df.columns, I got the following error,
[INSUFFICIENT_PERMISSIONS] Insufficient privileges: User does not have permission SELECT on any file. SQLSTATE: 42501 File <command-6451936430536458>, line 1----> 1 df.columns File /databricks/spark/python/pyspark/sql/connect/client/core.py:2155, in SparkConnectClient._handle_rpc_error(self, rpc_error) 2140 raise Exception( 2141 "Python versions in the Spark Connect client and server are different. " 2142 "To execute user-defined functions, client and server should have the " (...) 2151 "https://docs.databricks.com/en/release-notes/serverless.html" target="_blank" rel="noopener noreferrer">https://docs.databricks.com/en/release-notes/serverless.html</a>.</span><span>"</span> 2152 ) 2153 # END-EDGE-> 2155 raise convert_exception( 2156 info, 2157 status.message, 2158 self._fetch_enriched_error(info), 2159 self._display_server_stack_trace(), 2160 ) from None 2162 raise SparkConnectGrpcException(status.message) from None 2163 else
Additional information:
1. Since it's a Unity catalog-enabled workspace, I'm following the external location and not the traditional mount point for the ADLS gen2 where the excel files are present. In a different workspace (non-unity catalog enabled), I was able to create the df successfully for the same code, but the file location is based on the mount point path.
2. Also tried reading excel file from Volume. Still the same error.
3. I was able to create a df for the csv, json files in the same adls gen2 as follows,
df = spark.read.format('csv').option('header',True).load('abfss://dev@dlake.dfs.core.windows.net/bom/validity/BOM_VALIDITY.csv')
I'm not having any issues and was able to display the df. I have the following permissions over the external location, create external table, read files and write files.
What's wrong, and kindly help me in this regard. TIA.
- Labels:
-
Spark
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
2 weeks ago
At first glance is seems you have a permission issue. Here are some things to consider and try. Regards, Louis.
The error `[INSUFFICIENT_PERMISSIONS] Insufficient privileges: User does not have permission SELECT on any file` indicates that the user lacks the necessary permissions to access files in the Unity Catalog-enabled workspace. Below are potential causes and solutions for resolving this issue:
---
Cause
1. Unity Catalog Permissions: Unity Catalog enforces fine-grained access control, and users must have the appropriate privileges to access external locations or files. Permissions like `SELECT`, `USE SCHEMA`, and `USE CATALOG` are required for accessing data stored in external locations.
2. Cluster Access Mode: If the cluster is running in shared (user isolation) mode, additional restrictions apply. Users cannot directly access files unless permissions are explicitly granted.
---
Solutions
1. Verify External Location Permissions
Ensure that you have the required privileges on the external location where your Excel file resides:
- Run the following SQL commands to check permissions:
```sql
DESCRIBE EXTERNAL LOCATION <location-name>;
SHOW GRANTS ON EXTERNAL LOCATION <location-name>;
```
- Confirm that you have `READ FILES` privilege on the external location.
#### **2. Grant Necessary Permissions**
If permissions are missing, an admin can grant them using SQL commands:
- Grant `SELECT` permission on files:
```sql
GRANT SELECT ON ANY FILE TO '<user>';
```
- Ensure you also have `USE CATALOG` and `USE SCHEMA` privileges for the relevant catalog and schema:
```sql
GRANT USE CATALOG ON <catalog-name> TO '<user>';
GRANT USE SCHEMA ON <schema-name> TO '<user>';
```
3. Check Cluster Configuration
Ensure your cluster is properly configured for Unity Catalog access:
- Use Databricks Runtime 15.4 or above, as it supports dedicated access mode with data filtering[2].
- If possible, switch to a single-user cluster mode to bypass shared mode restrictions temporarily[6].
4. Alternative Approach for Excel Files
Reading Excel files in Databricks can be challenging due to library dependencies. Instead of relying on `com.crealytics.spark.excel`, consider using Pandas-based methods:
- Install the `openpyxl` library:
```python
%pip install openpyxl
```
- Read the file using Pandas and convert it to a Spark DataFrame:
```python
from pyspark.pandas import read_excel
pd_df = read_excel("abfss://dev@dlake.dfs.core.windows.net/myfiles/file1.xlsx", "Sheet1")
spark_df = pd_df.to_spark()
display(spark_df)
```
This bypasses potential issues with Unity Catalog permissions specific to Excel libraries.
5. Debugging External Location Setup
If issues persist, confirm the external location setup:
- Ensure that storage credentials are correctly configured and bound to the external location.
- Avoid mounting storage accounts directly to DBFS if they are being used as external locations in Unity Catalog.
Additional Notes
1. The ability to read CSV and JSON files successfully suggests that your permissions for those formats are correctly configured but may not extend to Excel files due to specific library requirements or Unity Catalog restrictions.
2. If writing Excel files fails, note that Unity Catalog volumes do not support non-sequential writes for formats like Excel.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
2 weeks ago
Thanks for your input. To reiterate, I'm using the databricks Runtime 15.4 cluster, and I have the following permissions over the external location,
- create external table
- read files
- write files
So I think I have enough privileges over the external location. In addition to this, I have CSV and JSON files in the same container and directory for which I'm able to create and display df without any issues. I believe only for excel files the error occurs. If I'm having the required permissions, then why is the error based on permissions `[INSUFFICIENT_PERMISSIONS] Insufficient privileges.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
2 weeks ago
I did a little more digging and found further information:
-
Third-Party Libraries: Reading Excel files in Databricks often requires using third-party libraries such as
com.crealytics:spark-excel
. However, in Unity Catalog-enabled environments, there are restrictions on third-party libraries due to security isolation concerns. For example, thecom.crealytics:spark-excel
library may require additional permissions, such as granting "ANY FILE" access, to function correctly in a Unity Catalog environment. Without these explicit permissions, errors may occur when attempting to read Excel files. -
Pandas Workaround: While third-party libraries might have access restrictions, the
pandas.read_excel
function can typically be used to read Excel files without such issues, as it does not rely on the same security isolation mechanisms. -
Unity Catalog Volumes Limitations: Direct writes or non-sequential access for Excel files (e.g.,
.xlsx
format) are not supported in Unity Catalog volumes. Users need to perform operations on local storage first and then copy the files to Unity Catalog volumes as a workaround.
python
import pandas as pd
df = pd.read_excel("/path/to/your/excel/file.xlsx")
-
Granting Permissions: If using
com.crealytics:spark-excel
, you may need to ensure the correct permissions are in place (e.g., granting "ANY FILE" on external locations). Consult with your Unity Catalog administrator to verify or adjust permissions. -
Intermediate Local File Operations: For writing Excel files, consider writing them locally using libraries like
xlsxwriter
and moving them to Unity Catalog volumes after the operation is complete: ```python from shutil import copyfile import xlsxwriter# Write Excel file locally workbook = xlsxwriter.Workbook('/local_disk0/tmp/excel.xlsx') worksheet = workbook.add_worksheet() worksheet.write(0, 0, "Key") worksheet.write(0, 1, "Value") workbook.close()# Copy to Unity Catalog volume copyfile('/local_disk0/tmp/excel.xlsx', '/Volumes/my_catalog/my_schema/my_volume/excel.xlsx') ```

