cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Serverless: SparkConnectGrpcException: 403 Forbidden: 403: Invalid access token.

hietpas
New Contributor III
I am running a Databricks job with a runas principal using Serverless compute. The job has a single notebook task. The notebook runs successfully on a job compute or when I run (as me) in the Workspace (still using serverless compute). However, I get a strange error when running as a service principal via the job:
 
SparkConnectGrpcException: 403 Forbidden: 403: Invalid access token. [ReqId: a4202901-13bb-40c0-8bdf-718db9f5da63] [Trace ID: 00-6433c827e5ef437784f44e8558df32ef-ae4061a78da73c36-00]
File <command-6464347790329852>, line 13 11 if debug_mode: 12 print('Dummy spark sql...') ---> 13 spark.sql(f"SELECT 1 as myValue"); 16 if debug_mode: 17 print('About to declare var_destination_database_name...')
 
This notebook used to result in a different strange error:
hietpas_0-1769632818999.png

I was able to workaround that error by adding the principal to the "admins" group. Not ideal, but it worked. While attempting to determine the fine-grained permissions required, I accidentally deleted the Principal from the workspace. After re-adding the principal (and granting permissions to catalog/schema), I started getting the new error noted above.

This exact same code and similar permissions still work in other environments. If the principal is in admins group, it works. If not in admins, I get the "SELECT on any file" error. Only this environment has the token error. This seems like it is potentially a bug in how serverless runs??

The code that causes the error is the first "spark.sql" statement. In this case, I added a dummy line:

spark.sql("SELECT 1 as myValue")
1 ACCEPTED SOLUTION

Accepted Solutions

MoJaMa
Databricks Employee
Databricks Employee

That error usually means you are trying to read from some path (examples: /dbfs, /tmp, /cloud storage) which is not a registered UC Location. [or you have some 3rd party library/connecter (one that's not supported in Lakehouse Fed or Lakeflow Connect etc)]

The reason for this error is that Serverless uses the same permission model as a Unity Catalog Standard (p.k.a Shared) access mode and enforces both UC ACLs and legacy Table ACL's. In this mode any "unknown" path referred directly or indirectly (a connector could write temp files to /tmp for example) falls into the legacy Table ACL world. And the only privilege in that old model was this ANY FILE grant. (contrast it with READ FILES, WRITE FILES on each External Location in UC). The ANY FILE is an all or nothing GRANT on all non-UC paths. (ie not per path)

You should get this same error if you run this on a non-Serverless UC Standard Access Mode cluster.

If you have no way to work around this, you can do the following:

GRANT SELECT ON ANY FILE TO `principal`

to the identity running the job and it will work.

(Workspace admins get this ANY FILE by default which is why you don't need to explicitly grant it.

In general, the best practice is to try and only use UC registered Locations/Volumes and then you are unlikely to need this specific permission.

View solution in original post

3 REPLIES 3

MoJaMa
Databricks Employee
Databricks Employee

That error usually means you are trying to read from some path (examples: /dbfs, /tmp, /cloud storage) which is not a registered UC Location. [or you have some 3rd party library/connecter (one that's not supported in Lakehouse Fed or Lakeflow Connect etc)]

The reason for this error is that Serverless uses the same permission model as a Unity Catalog Standard (p.k.a Shared) access mode and enforces both UC ACLs and legacy Table ACL's. In this mode any "unknown" path referred directly or indirectly (a connector could write temp files to /tmp for example) falls into the legacy Table ACL world. And the only privilege in that old model was this ANY FILE grant. (contrast it with READ FILES, WRITE FILES on each External Location in UC). The ANY FILE is an all or nothing GRANT on all non-UC paths. (ie not per path)

You should get this same error if you run this on a non-Serverless UC Standard Access Mode cluster.

If you have no way to work around this, you can do the following:

GRANT SELECT ON ANY FILE TO `principal`

to the identity running the job and it will work.

(Workspace admins get this ANY FILE by default which is why you don't need to explicitly grant it.

In general, the best practice is to try and only use UC registered Locations/Volumes and then you are unlikely to need this specific permission.

hietpas
New Contributor III

Thanks @MoJaMa!

Originally, I attempted the grant using a SQL Warehouse and it failed. However, I ran again in a notebook using serverless compute and it succeeded. This does appear to resolve the original issue.

On a side note, the original error was when creating a view over files using an abfss path. However, I modified it to use the Volumes path and the error still persisted. I would have thought this would have removed the need for the ANY FILES permission (since a UC object). I'll do more testing on that.

As for the "SparkConnectGrpcException: 403 Forbidden: 403: Invalid access token." error, any insight into why this is happening? As far as I can tell, the principal has all the original grants that it originally had, but it can no longer run a spark.sql command within the serverless compute (while this works fine in other environments).

MoJaMa
Databricks Employee
Databricks Employee

It's just a bad error. Spark Connect probably showing a not-so-useful section of the stack-trace.

creating a view over files using an abfss path --> this is the "cloud storage path" I'm talking about. If this was not registered as a UC external location then it falls into the legacy TACL path (hence the need for ANY FILE).

For Volume (assuming your files are in the registered UC Volume), if you try something like this, do you get an error? This is run from Serverless Notebook and I'm creating a view over files in a volume. (this won't require you to give the ANY FILE grant). This same command should work from UC-Standard and DBSQL as well.