Databricks Community

Ramonrcn · ‎02-19-2025

Hi!

I have a pipeline that i cant execute sucessfully in a shared cluster. Basically i read a query from multiple sources on my databricks instance, including streaming tables (thats the reason i have to use a shared cluster).

But when comes to the part to write the results to my hive_metastore bronze layer, i get insuficient permission errors, the first one was that the cluster did'nt have access permission enough to determine if given table existed, i got arround that error with a quick and dirty python solution:

def check_table():

try:

spark.sql(f"select * from db_bronze.{table}")

return 1

except Exception as e:

return 2

and my spark code is the following:

# Check if the table exists

exists = check_table()

if exists == 2:

windowSpec = Window.partitionBy("ID").orderBy(F.desc("ingestDatetime"))

df = df.withColumn("rank", F.row_number().over(windowSpec)) \

.filter(F.col("rank") == 1) \

.drop("rank")

df.write \

.format("delta") \

.mode("append") \

.option("path", bronze_path_full) \

.option("mergeSchema", "true") \

.saveAsTable(f"db_bronze.{table}")

else:

df.createOrReplaceTempView("vw_source")

spark.sql("""

CREATE OR REPLACE TEMP VIEW vw_latest_source AS

SELECT *

FROM (

SELECT *, ROW_NUMBER() OVER(PARTITION BY ID ORDER BY ingestDatetime DESC) as row_num

FROM vw_source

) tmp

WHERE row_num = 1

""")

spark.sql(f"""

MERGE INTO db_bronze.{table} as target

USING vw_latest_source as source

ON target.ID = source.ID

WHEN MATCHED AND source.ingestDatetime > target.ingestDatetime THEN

UPDATE SET *

WHEN NOT MATCHED THEN

INSERT *

""")

when i run this insert cell it gives this error:

org.apache.spark.SparkSecurityException: [INSUFFICIENT_PERMISSIONS] Insufficient privileges:

I've already tried:
Grant permissions via sql code in the notebook using the grant permission to user, grant ownership, grant all-files permission but i cant work arround this issue.

My user has all the permissions to create, modify, delete tables in hive_metastore and other catalogs but i suspect it is a cluster level permission issue.
Anyone faced the same situation?

MariuszK · ‎02-20-2025

What path do you use to save data?

Ramonrcn · ‎02-20-2025

bronze_path = "abfss://bronze@datalake.dfs.core.windows.net/"

table = "my_report"

bronze_path_full = bronze_path + table

MariuszK · ‎02-20-2025

Is datalake correct name of you storage? You should check if you have access to external locations in UC and if the name is correct.

Ramonrcn · ‎02-20-2025

datalake is just a placeholder name. You mean check if my user has access to external locations in UC or if the cluster has access?

saurabh18cs · ‎02-20-2025

-- Grant SELECT, INSERT, UPDATE, DELETE permissions on the table

GRANT SELECT, INSERT, UPDATE, DELETE ON TABLE db_bronze.<table> TO `user@example.com`;

-- Grant ALL PRIVILEGES on the database

GRANT ALL PRIVILEGES ON DATABASE db_bronze TO `user@example.com`;

-- Grant OWNERSHIP on the table

GRANT OWNERSHIP ON TABLE db_bronze.<table> TO `user@example.com`;

Ramonrcn · ‎02-20-2025

already tried this solution, throws this error:
SparkSecurityException: [INSUFFICIENT_PERMISSIONS] Insufficient privileges: User does not own database `db_bronze`.

Ramonrcn · ‎03-07-2025

Just an update to this topic:
Some days ago me and my databricks admin were vizualizing the same notebook using the shared cluster, he ran the cell with insert and 'grant all permissions on db_bronze to 'myuser@company'.

It took a while but the table was created and populated. Fast forward a few days, we had to change the logic on the sql query, i've updated the query, but i need to drop/truncate the table created previously and guess what: Insuficient permissions.

This specific catalog is in hive_metastore, what permission i need to ask to my admin to my profile?

Thanks in advance!

Nivethan_Venkat · ‎03-07-2025

Hi @Ramonrcn,

If I understand your question, you should need to have MODIFY / ALL PRIVILEGES permission on top of the table inorder to drop or modify a table. And if you are performing this change using Managed Identity / IAM, the same permission mentioned above should have to be applied.

Please check the highlighted section in Databricks docs for necessary privileges required: https://docs.databricks.com/aws/en/data-governance/table-acls/object-privileges#privileges-you-can-g...

Best Regards,
Nivethan