cancel
Showing results for 
Search instead for 
Did you mean: 
Community Discussions
Connect with fellow community members to discuss general topics related to the Databricks platform, industry trends, and best practices. Share experiences, ask questions, and foster collaboration within the community.
cancel
Showing results for 
Search instead for 
Did you mean: 

Insufficient privileges:User does not have permission SELECT on any file

GeKo
New Contributor III

Hello,

after switching to "shared cluster" usage a python job is failing with error message:

 

 

Py4JJavaError: An error occurred while calling o877.load.
: org.apache.spark.SparkSecurityException: [INSUFFICIENT_PERMISSIONS] Insufficient privileges:
User does not have permission SELECT on any file.

 

 

This error happens on the attempt of reading messages from a Kafka topic, according to the stacktrace (in the spark method spark_.read) =>

 

 

    288 else:
    289     raw_df = (
    290         self.spark_.read.format("kafka")
    291         .option(
    292             "kafka.bootstrap.servers",
    293             self.kafka_secrets.kafka_bootstrap_servers,
    294         )
    295         .option("subscribe", topic.topic)
    296         .option("groupIdPrefix", topic.consumer_group_prefix)
    297         .option("startingOffsets", "earliest")
    298         .option("failOnDataLoss", "false")
    299         .option("includeHeaders", "true")
    300         .options(**self.sasl_ssl_auth_options)
    301         .options(**spark_opts)
--> 302         .load()
    303     ).drop("timestampType")

 

 

 

 The job runs fine if "streaming" is enabled, means we use spark_.readStream instead.

What exactly is raising the "INSUFFICIENT_PERMISSIONS" error, at using "spark_.read" methon , and how to get rid of it ?!?!

Usually this error is thrown if someone wants to access data on DBFS or has tableACLs enabled, but both of them is not the case here.

Context:

  • using shared cluster
  • everything is managed via UnityCatalog
  • no Hive metastore is in use, table ACLs are disabled
  • the job does not interact with any data from DBFS (it simply wants to read from Kafka), also potential checkpoints of Kafka are configured to use UC Volume
  • I know that the statement "grant select on any file..." would solve the problem, but I don't want to use it, since I explicitly do not want to allow something on DBFS which I do not want to use anyways, neither Hive metastore related stuff

Since the difference in behaviour is between using spark_.read vs spark_.readStream my guess is, that the spark_.read is internally trying to access/interact with Hive-Metastore

Any hint how to eliminate this issue is highly appreciated 😄

3 REPLIES 3

Hkesharwani
Contributor II

Hi, The reason for this issue could be shared cluster, Unity catalog best supports with personal cluster or job clusters.
I would suggest try using personal cluster.
Check out the below article this might help
https://community.databricks.com/t5/data-engineering/create-table-using-a-location/td-p/68725

Harshit Kesharwani
Self-taught Data Engineer | Seeking Remote Full-time Opportunities

GeKo
New Contributor III

Hello @Hkesharwani ,

thanks for replying.

Indeed, as I stated in the beginning of my post, the issue occurs only with shared cluster usage (single user cluster all is fine). Since I *have to* switch to shared cluster (rowlevel security is only available there atm.), it would be great if someone provides any insights of what is causing this issue on shared clusters.

sravs_227
New Contributor II

hey @GeKo 

did you get any solution ?

Join 100K+ Data Experts: Register Now & Grow with Us!

Excited to expand your horizons with us? Click here to Register and begin your journey to success!

Already a member? Login and join your local regional user group! If there isn’t one near you, fill out this form and we’ll create one for you to join!