cancel
Showing results for 
Search instead for 
Did you mean: 
Community Platform Discussions
Connect with fellow community members to discuss general topics related to the Databricks platform, industry trends, and best practices. Share experiences, ask questions, and foster collaboration within the community.
cancel
Showing results for 
Search instead for 
Did you mean: 

Insufficient privileges:User does not have permission SELECT on any file

GeKo
New Contributor III

Hello,

after switching to "shared cluster" usage a python job is failing with error message:

 

 

Py4JJavaError: An error occurred while calling o877.load.
: org.apache.spark.SparkSecurityException: [INSUFFICIENT_PERMISSIONS] Insufficient privileges:
User does not have permission SELECT on any file.

 

 

This error happens on the attempt of reading messages from a Kafka topic, according to the stacktrace (in the spark method spark_.read) =>

 

 

    288 else:
    289     raw_df = (
    290         self.spark_.read.format("kafka")
    291         .option(
    292             "kafka.bootstrap.servers",
    293             self.kafka_secrets.kafka_bootstrap_servers,
    294         )
    295         .option("subscribe", topic.topic)
    296         .option("groupIdPrefix", topic.consumer_group_prefix)
    297         .option("startingOffsets", "earliest")
    298         .option("failOnDataLoss", "false")
    299         .option("includeHeaders", "true")
    300         .options(**self.sasl_ssl_auth_options)
    301         .options(**spark_opts)
--> 302         .load()
    303     ).drop("timestampType")

 

 

 

 The job runs fine if "streaming" is enabled, means we use spark_.readStream instead.

What exactly is raising the "INSUFFICIENT_PERMISSIONS" error, at using "spark_.read" methon , and how to get rid of it ?!?!

Usually this error is thrown if someone wants to access data on DBFS or has tableACLs enabled, but both of them is not the case here.

Context:

  • using shared cluster
  • everything is managed via UnityCatalog
  • no Hive metastore is in use, table ACLs are disabled
  • the job does not interact with any data from DBFS (it simply wants to read from Kafka), also potential checkpoints of Kafka are configured to use UC Volume
  • I know that the statement "grant select on any file..." would solve the problem, but I don't want to use it, since I explicitly do not want to allow something on DBFS which I do not want to use anyways, neither Hive metastore related stuff

Since the difference in behaviour is between using spark_.read vs spark_.readStream my guess is, that the spark_.read is internally trying to access/interact with Hive-Metastore

Any hint how to eliminate this issue is highly appreciated 😄

5 REPLIES 5

Hkesharwani
Contributor II

Hi, The reason for this issue could be shared cluster, Unity catalog best supports with personal cluster or job clusters.
I would suggest try using personal cluster.
Check out the below article this might help
https://community.databricks.com/t5/data-engineering/create-table-using-a-location/td-p/68725

Harshit Kesharwani
Data engineer at Rsystema

GeKo
New Contributor III

Hello @Hkesharwani ,

thanks for replying.

Indeed, as I stated in the beginning of my post, the issue occurs only with shared cluster usage (single user cluster all is fine). Since I *have to* switch to shared cluster (rowlevel security is only available there atm.), it would be great if someone provides any insights of what is causing this issue on shared clusters.

sravs_227
New Contributor II

hey @GeKo 

did you get any solution ?

GeKo
New Contributor III

Hi @sravs_227 ,
the issue was, that the checkpoint directory (while reading from kafka) was set to a dbfs folder. We switched this now to also UC volume

Uj337
New Contributor III

Hi @GeKo 
The checkpoint directory, is that set on cluster level or how do we set that ? Can you please help me with this ?

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group