cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Cant read/write tables with shared cluster

Ramonrcn
New Contributor III

Hi!

I have a pipeline that i cant execute sucessfully in a shared cluster. Basically i read a query from multiple sources on my databricks instance, including streaming tables (thats the reason i have to use a shared cluster).

But when comes to the part to write the results to my hive_metastore bronze layer, i get insuficient permission errors, the first one was that the cluster did'nt have access permission enough to determine if given table existed, i got arround that error with a quick and dirty python solution:

def check_table():
    try:
        spark.sql(f"select * from db_bronze.{table}")
        return 1
    except Exception as e:
        return 2
 
and my spark code is the following:
# Check if the table exists
exists = check_table()

if exists == 2:
    windowSpec = Window.partitionBy("ID").orderBy(F.desc("ingestDatetime"))
    df = df.withColumn("rank", F.row_number().over(windowSpec)) \
           .filter(F.col("rank") == 1) \
           .drop("rank")
    df.write \
      .format("delta") \
      .mode("append") \
      .option("path", bronze_path_full) \
      .option("mergeSchema", "true") \
      .saveAsTable(f"db_bronze.{table}")
else:
    df.createOrReplaceTempView("vw_source")
    spark.sql("""
        CREATE OR REPLACE TEMP VIEW vw_latest_source AS
        SELECT *
        FROM (
            SELECT *, ROW_NUMBER() OVER(PARTITION BY ID ORDER BY ingestDatetime DESC) as row_num
            FROM vw_source
        ) tmp
        WHERE row_num = 1
    """)
    spark.sql(f"""
        MERGE INTO db_bronze.{table} as target
        USING vw_latest_source as source
        ON target.ID = source.ID
        WHEN MATCHED AND source.ingestDatetime > target.ingestDatetime THEN
            UPDATE SET *
        WHEN NOT MATCHED THEN
            INSERT *
    """)
when i run this insert cell it gives this error:
org.apache.spark.SparkSecurityException: [INSUFFICIENT_PERMISSIONS] Insufficient privileges:
 
I've already tried:
Grant permissions via sql code in the notebook using the grant permission to user, grant ownership, grant all-files permission but i cant work arround this issue.

My user has all the permissions to create, modify, delete tables in hive_metastore and other catalogs but i suspect it is a cluster level permission issue.
Anyone faced the same situation?
8 REPLIES 8

MariuszK
Contributor III

What path do you use to save data?

Ramonrcn
New Contributor III
bronze_path = "abfss://bronze@datalake.dfs.core.windows.net/"
table = "my_report"
bronze_path_full = bronze_path + table

MariuszK
Contributor III

Is datalake correct name of you storage? You should check if you have access to external locations in UC and if the name is correct.

Ramonrcn
New Contributor III

datalake is just a placeholder name. You mean check if my user has access to external locations in UC or if the cluster has access?

saurabh18cs
Valued Contributor III
-- Grant SELECT, INSERT, UPDATE, DELETE permissions on the table
GRANT SELECT, INSERT, UPDATE, DELETE ON TABLE db_bronze.<table> TO `user@example.com`;

-- Grant ALL PRIVILEGES on the database
GRANT ALL PRIVILEGES ON DATABASE db_bronze TO `user@example.com`;

-- Grant OWNERSHIP on the table
GRANT OWNERSHIP ON TABLE db_bronze.<table> TO `user@example.com`;

Ramonrcn
New Contributor III

already tried this solution, throws this error:
SparkSecurityException: [INSUFFICIENT_PERMISSIONS] Insufficient privileges: User does not own database `db_bronze`.

Ramonrcn
New Contributor III

Just an update to this topic:
Some days ago me and my databricks admin were vizualizing the same notebook using the shared cluster, he ran the cell with insert and 'grant all permissions on db_bronze to 'myuser@company'.

It took a while but the table was created and populated. Fast forward a few days, we had to change the logic on the sql query, i've updated the query, but i need to drop/truncate the table created previously and guess what: Insuficient permissions.

This specific catalog is in hive_metastore, what permission i need to ask to my admin to my profile?

Thanks in advance!

Nivethan_Venkat
New Contributor III

Hi @Ramonrcn,

If I understand your question, you should need to have MODIFY / ALL PRIVILEGES permission on top of the table inorder to drop or modify a table. And if you are performing this change using Managed Identity / IAM, the same permission mentioned above should have to be applied.

Please check the highlighted section in Databricks docs for necessary privileges required: https://docs.databricks.com/aws/en/data-governance/table-acls/object-privileges#privileges-you-can-g...


Best Regards,
Nivethan 

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group