cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Permission denied on shallow cloned table write on single cluster

der
New Contributor III

If I want to modify a shallow cloned table with partitionOverwriteMode dynamic on a "dedicated/single user" cluster DBR 16.4 i get following error message:

 

Py4JJavaError: An error occurred while calling o483.saveAsTable.
: org.apache.spark.SparkException: [TASK_WRITE_FAILED] Task failed while writing rows to abfss://<TABLE_STORAGE_PLACE>. SQLSTATE: 58030
at org.apache.spark.sql.errors.QueryExecutionErrors$.taskFailedWhileWritingRowsError(QueryExecutionErrors.scala:996)
at org.apache.spark.sql.execution.datasources.FileFormatDataWriter.enrichWriteError(FileFormatDataWriter.scala:109)
at org.apache.spark.sql.execution.datasources.FileFormatDataWriter.writeWithMetrics(FileFormatDataWriter.scala:120)
at org.apache.spark.sql.execution.datasources.FileFormatDataWriter.writeWithIterator(FileFormatDataWriter.scala:128)
at org.apache.spark.sql.execution.datasources.FileFormatWriter$.$anonfun$executeTask$1(FileFormatWriter.scala:559)
at org.apache.spark.util.Utils$.tryWithSafeFinallyAndFailureCallbacks(Utils.scala:1628)
at org.apache.spark.sql.execution.datasources.FileFormatWriter$.executeTask(FileFormatWriter.scala:566)
at org.apache.spark.sql.execution.datasources.WriteFilesExec.$anonfun$doExecuteWrite$1(WriteFiles.scala:125)
at org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2(RDD.scala:938)
at org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2$adapted(RDD.scala:938)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:60)
at org.apache.spark.rdd.RDD.$anonfun$computeOrReadCheckpoint$1(RDD.scala:413)
at com.databricks.spark.util.ExecutorFrameProfiler$.record(ExecutorFrameProfiler.scala:110)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:410)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:377)
at org.apache.spark.scheduler.ResultTask.$anonfun$runTask$3(ResultTask.scala:82)
at com.databricks.spark.util.ExecutorFrameProfiler$.record(ExecutorFrameProfiler.scala:110)
at org.apache.spark.scheduler.ResultTask.$anonfun$runTask$1(ResultTask.scala:82)
at com.databricks.spark.util.ExecutorFrameProfiler$.record(ExecutorFrameProfiler.scala:110)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:62)
at org.apache.spark.TaskContext.runTaskWithListeners(TaskContext.scala:225)
at org.apache.spark.scheduler.Task.doRunTask(Task.scala:199)
at org.apache.spark.scheduler.Task.$anonfun$run$5(Task.scala:161)
at com.databricks.unity.UCSEphemeralState$Handle.runWith(UCSEphemeralState.scala:51)
at com.databricks.unity.HandleImpl.runWith(UCSHandle.scala:104)
at com.databricks.unity.HandleImpl.$anonfun$runWithAndClose$1(UCSHandle.scala:109)
at scala.util.Using$.resource(Using.scala:269)
at com.databricks.unity.HandleImpl.runWithAndClose(UCSHandle.scala:108)
at org.apache.spark.scheduler.Task.$anonfun$run$1(Task.scala:155)
at com.databricks.spark.util.ExecutorFrameProfiler$.record(ExecutorFrameProfiler.scala:110)
at org.apache.spark.scheduler.Task.run(Task.scala:102)
at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$10(Executor.scala:1043)
at org.apache.spark.util.SparkErrorUtils.tryWithSafeFinally(SparkErrorUtils.scala:64)
at org.apache.spark.util.SparkErrorUtils.tryWithSafeFinally$(SparkErrorUtils.scala:61)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:111)
at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:1046)
at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
at com.databricks.spark.util.ExecutorFrameProfiler$.record(ExecutorFrameProfiler.scala:110)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:933)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
at java.base/java.lang.Thread.run(Thread.java:840)
at org.apache.spark.scheduler.DAGScheduler.$anonfun$runJob$1(DAGScheduler.scala:1413)
at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
at com.databricks.spark.util.FrameProfiler$.record(FrameProfiler.scala:94)
at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:1401)
at org.apache.spark.SparkContext.runJobInternal(SparkContext.scala:3171)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:3152)
at org.apache.spark.sql.execution.datasources.FileFormatWriter$.$anonfun$executeWrite$6(FileFormatWriter.scala:435)
at org.apache.spark.sql.catalyst.MetricKeyUtils$.measureMs(MetricKey.scala:1195)
at org.apache.spark.sql.execution.datasources.FileFormatWriter$.$anonfun$executeWrite$5(FileFormatWriter.scala:433)
at org.apache.spark.sql.execution.datasources.FileFormatWriter$.writeAndCommit(FileFormatWriter.scala:395)
at org.apache.spark.sql.execution.datasources.FileFormatWriter$.executeWrite(FileFormatWriter.scala:431)
at org.apache.spark.sql.execution.datasources.FileFormatWriter$.$anonfun$write$1(FileFormatWriter.scala:300)
at com.databricks.spark.util.FrameProfiler$.record(FrameProfiler.scala:94)
at org.apache.spark.sql.execution.datasources.FileFormatWriter$.write(FileFormatWriter.scala:121)
at com.databricks.sql.transaction.tahoe.commands.WriteIntoDeltaCommand.run(WriteIntoDeltaCommand.scala:121)
at org.apache.spark.sql.execution.command.DataWritingCommandExec.$anonfun$sideEffectResult$5(commands.scala:137)
at org.apache.spark.sql.execution.SparkPlan.runCommandInAetherOrSpark(SparkPlan.scala:189)
at org.apache.spark.sql.execution.command.DataWritingCommandExec.$anonfun$sideEffectResult$4(commands.scala:137)
at com.databricks.spark.util.FrameProfiler$.record(FrameProfiler.scala:94)
at org.apache.spark.sql.execution.command.DataWritingCommandExec.sideEffectResult$lzycompute(commands.scala:133)
at org.apache.spark.sql.execution.command.DataWritingCommandExec.sideEffectResult(commands.scala:132)
at org.apache.spark.sql.execution.command.DataWritingCommandExec.$anonfun$doExecute$4(commands.scala:161)
at com.databricks.spark.util.FrameProfiler$.record(FrameProfiler.scala:94)
at org.apache.spark.sql.execution.command.DataWritingCommandExec.doExecute(commands.scala:161)
at org.apache.spark.sql.execution.SparkPlan.$anonfun$execute$2(SparkPlan.scala:341)
at com.databricks.spark.util.FrameProfiler$.record(FrameProfiler.scala:94)
at org.apache.spark.sql.execution.SparkPlan.$anonfun$execute$1(SparkPlan.scala:341)
at org.apache.spark.sql.execution.SparkPlan$.org$apache$spark$sql$execution$SparkPlan$$withExecuteQueryLogging(SparkPlan.scala:132)
at org.apache.spark.sql.execution.SparkPlan.$anonfun$executeQuery$1(SparkPlan.scala:399)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:165)
at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:395)
at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:336)
at org.apache.spark.sql.execution.adaptive.AdaptiveSparkPlanExec.$anonfun$doExecute$1(AdaptiveSparkPlanExec.scala:981)
at org.apache.spark.sql.execution.adaptive.ResultQueryStageExec.$anonfun$doMaterialize$1(QueryStageExec.scala:663)
at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:1210)
at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withThreadLocalCaptured$7(SQLExecution.scala:905)
at com.databricks.util.LexicalThreadLocal$Handle.runWith(LexicalThreadLocal.scala:63)
at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withThreadLocalCaptured$6(SQLExecution.scala:905)
at com.databricks.util.LexicalThreadLocal$Handle.runWith(LexicalThreadLocal.scala:63)
at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withThreadLocalCaptured$5(SQLExecution.scala:905)
at scala.util.DynamicVariable.withValue(DynamicVariable.scala:62)
at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withThreadLocalCaptured$4(SQLExecution.scala:904)
at scala.util.DynamicVariable.withValue(DynamicVariable.scala:62)
at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withThreadLocalCaptured$3(SQLExecution.scala:903)
at com.databricks.sql.transaction.tahoe.OptimisticTransaction$.withActive(OptimisticTransaction.scala:216)
at com.databricks.sql.transaction.tahoe.ConcurrencyHelpers$.withOptimisticTransaction(ConcurrencyHelpers.scala:54)
at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withThreadLocalCaptured$2(SQLExecution.scala:902)
at org.apache.spark.JobArtifactSet$.withActiveJobArtifactState(JobArtifactSet.scala:97)
at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withThreadLocalCaptured$1(SQLExecution.scala:886)
at java.base/java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1768)
at org.apache.spark.util.threads.SparkThreadLocalCapturingRunnable.$anonfun$run$1(SparkThreadLocalForwardingThreadPoolExecutor.scala:157)
at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
at com.databricks.spark.util.IdentityClaim$.withClaim(IdentityClaim.scala:48)
at org.apache.spark.util.threads.SparkThreadLocalCapturingHelper.$anonfun$runWithCaptured$4(SparkThreadLocalForwardingThreadPoolExecutor.scala:113)
at com.databricks.unity.UCSEphemeralState$Handle.runWith(UCSEphemeralState.scala:51)
at org.apache.spark.util.threads.SparkThreadLocalCapturingHelper.runWithCaptured(SparkThreadLocalForwardingThreadPoolExecutor.scala:112)
at org.apache.spark.util.threads.SparkThreadLocalCapturingHelper.runWithCaptured$(SparkThreadLocalForwardingThreadPoolExecutor.scala:89)
at org.apache.spark.util.threads.SparkThreadLocalCapturingRunnable.runWithCaptured(SparkThreadLocalForwardingThreadPoolExecutor.scala:154)
at org.apache.spark.util.threads.SparkThreadLocalCapturingRunnable.run(SparkThreadLocalForwardingThreadPoolExecutor.scala:157)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
at java.base/java.lang.Thread.run(Thread.java:840)
Caused by: com.databricks.sql.managedcatalog.acl.UnauthorizedAccessException: PERMISSION_DENIED: User does not have MODIFY on Table 'catalog1.schema1.product_sales'.
at com.databricks.managedcatalog.UCReliableHttpClient.reliablyAndTranslateExceptions(UCReliableHttpClient.scala:152)
at com.databricks.managedcatalog.UCReliableHttpClient.postJsonWithOptions(UCReliableHttpClient.scala:190)
at com.databricks.managedcatalog.ManagedCatalogClientImpl.generateTemporaryTableCredentials(ManagedCatalogClientImpl.scala:3463)
at com.databricks.managedcatalog.ManagedCatalogClientImpl.$anonfun$getTableCredentials$1(ManagedCatalogClientImpl.scala:3521)
at com.databricks.managedcatalog.ManagedCatalogClientImpl.$anonfun$recordAndWrapException$2(ManagedCatalogClientImpl.scala:6873)
at com.databricks.spark.util.FrameProfiler$.record(FrameProfiler.scala:94)
at com.databricks.managedcatalog.ManagedCatalogClientImpl.$anonfun$recordAndWrapException$1(ManagedCatalogClientImpl.scala:6872)
at com.databricks.managedcatalog.ErrorDetailsHandler.wrapServiceException(ErrorDetailsHandler.scala:37)
at com.databricks.managedcatalog.ErrorDetailsHandler.wrapServiceException$(ErrorDetailsHandler.scala:35)
at com.databricks.managedcatalog.ManagedCatalogClientImpl.wrapServiceException(ManagedCatalogClientImpl.scala:216)
at com.databricks.managedcatalog.ManagedCatalogClientImpl.recordAndWrapException(ManagedCatalogClientImpl.scala:6853)
at com.databricks.managedcatalog.ManagedCatalogClientImpl.getTableCredentials(ManagedCatalogClientImpl.scala:3503)
at com.databricks.sql.managedcatalog.ManagedCatalogClient.getTemporaryCredentials(ManagedCatalogClient.scala:2398)
at com.databricks.sql.managedcatalog.ManagedCatalogClient.getTemporaryCredentials$(ManagedCatalogClient.scala:2383)
at com.databricks.managedcatalog.ManagedCatalogClientImpl.getTemporaryCredentials(ManagedCatalogClientImpl.scala:216)
at com.databricks.unity.TempCredCache.$anonfun$getInternal$7(TemporaryCredentials.scala:392)
at com.google.common.cache.LocalCache$LocalManualCache$1.load(LocalCache.java:4724)
at com.google.common.cache.LocalCache$LoadingValueReference.loadFuture(LocalCache.java:3522)
at com.google.common.cache.LocalCache$Segment.loadSync(LocalCache.java:2315)
at com.google.common.cache.LocalCache$Segment.lockedGetOrLoad(LocalCache.java:2278)
at com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2193)
at com.google.common.cache.LocalCache.get(LocalCache.java:3932)
at com.google.common.cache.LocalCache$LocalManualCache.get(LocalCache.java:4721)
at com.databricks.unity.TempCredCache.liftedTree1$1(TemporaryCredentials.scala:391)
at com.databricks.unity.TempCredCache.getInternal(TemporaryCredentials.scala:390)
at com.databricks.unity.TempCredCache.get(TemporaryCredentials.scala:319)
at com.databricks.unity.UnityCredentialManager.getTemporaryCredentials(CredentialManager.scala:471)
at com.databricks.unity.CredentialManager$.getTemporaryCredentials(CredentialManager.scala:849)
at com.databricks.unity.CredentialManagerRpcHelper.$anonfun$getTemporaryCredentials$1(UCSDriver.scala:280)
at com.databricks.unity.UCSEphemeralState$Handle.runWith(UCSEphemeralState.scala:51)
at com.databricks.unity.HandleImpl.runWith(UCSHandle.scala:104)
at com.databricks.unity.HandleImpl.$anonfun$runWithAndClose$1(UCSHandle.scala:109)
at scala.util.Using$.resource(Using.scala:269)
at com.databricks.unity.HandleImpl.runWithAndClose(UCSHandle.scala:108)
at com.databricks.unity.CredentialManagerRpcHelper.runWithScopeAndClose(UCSDriver.scala:254)
at com.databricks.unity.CredentialManagerRpcHelper.runWithScopeAndClose$(UCSDriver.scala:251)
at com.databricks.unity.CredentialManagerRpcHelper$.runWithScopeAndClose(UCSDriver.scala:285)
at com.databricks.unity.CredentialManagerRpcHelper.getTemporaryCredentials(UCSDriver.scala:280)
at com.databricks.unity.CredentialManagerRpcHelper.getTemporaryCredentials$(UCSDriver.scala:278)
at com.databricks.unity.CredentialManagerRpcHelper$.getTemporaryCredentials(UCSDriver.scala:285)
at org.apache.spark.unity.CredentialRpcEndpoint$$anonfun$receiveAndReply$1.applyOrElse(CredentialRpcEndpoint.scala:45)
at org.apache.spark.rpc.netty.Inbox.$anonfun$process$1(Inbox.scala:104)
at org.apache.spark.rpc.netty.Inbox.safelyCall(Inbox.scala:216)
at org.apache.spark.rpc.netty.Inbox.process(Inbox.scala:101)
at org.apache.spark.rpc.netty.MessageLoop.org$apache$spark$rpc$netty$MessageLoop$$receiveLoop(MessageLoop.scala:76)
at org.apache.spark.rpc.netty.MessageLoop$$anon$1.run(MessageLoop.scala:42)
... 12 more

I have full access on the cloned table, but only select rights on the source table.

On a "Standard/Shared" cluster it works without error! It also works, if the clone is deep for obvious reasons 

Example script:

from pyspark.sql import SparkSession
from pyspark.sql.types import StructType, StructField, StringType, IntegerType

# Clean up and setup
partitioned_table = f"catalog1.schema1.product_sales"
cloned_table = f"catalog2.schema2.product_sales_clone"

spark.sql(f"DROP TABLE IF EXISTS {partitioned_table}")
spark.sql(f"DROP TABLE IF EXISTS  {cloned_table}")

# 1. Create dummy DataFrame
schema_def = StructType([
    StructField("region", StringType(), True),
    StructField("product", StringType(), True),
    StructField("revenue", IntegerType(), True)
])

data = [
    ("North", "Widgets", 300),
    ("North", "Gadgets", 200),
    ("South", "Widgets", 150),
    ("South", "Gadgets", 100),
]
df = spark.createDataFrame(data, schema=schema_def)

# 2. Write and revoke MODIFY rights
(df.write.format("delta")
    .partitionBy("region")
    .mode("overwrite")
    .saveAsTable(partitioned_table))

# TODO: how you gave rights (catalog, schema, table)
# catalog_name = partitioned_table.split(".")[0]
# schema_name = partitioned_table.split(".")[1]
# spark.sql(f"REVOKE MODIFY ON TABLE {partitioned_table} FROM CURRENT_USER()")
# spark.sql(f"REVOKE MODIFY ON SCHEMA {catalog_name}.{schema_name} FROM CURRENT_USER()")
# spark.sql(f"REVOKE MODIFY ON CATALOG {catalog_name} FROM CURRENT_USER()")


# 3. Create a shallow clone
spark.sql(f"""
  CREATE OR REPLACE TABLE {cloned_table}
  SHALLOW CLONE {partitioned_table}
""")

# 4. Create new dummy data for a single partition
new_data = [
    ("North", "Widgets", 999),
    ("North", "Gadgets", 888)
]
new_df = spark.createDataFrame(new_data, schema=schema_def)

# 5. Overwrite only the 'North' partition of the cloned table dynamically
(new_df.write.format("delta")
    .partitionBy("region")
    .mode("overwrite")
    .option("partitionOverwriteMode", "dynamic")
    .saveAsTable(cloned_table))
1 ACCEPTED SOLUTION

Accepted Solutions

Isi
Contributor III

Hey @der 

"I have full access on the cloned table, but only select rights on the source table."


When working with
shallow clones in Unity Catalog on a dedicated or single-user cluster, Databricks enforces strict permission inheritance from the source table.

To perform any update/insert on a shallow cloned table, you must have:

  • USE permission on the source catalog and schema

  • SELECT permission on the source table

  • MODIFY permission on the source table

Even if you’re modifying the cloned table, the underlying Delta metadata and lineage trace back to the original table, and Unity Catalog enforces this by default. Docs_dedicated

In standard mode works because it follows a different privileges system Docs_standard 

Hope this helps, 🙂

Isi

View solution in original post

2 REPLIES 2

Isi
Contributor III

Hey @der 

"I have full access on the cloned table, but only select rights on the source table."


When working with
shallow clones in Unity Catalog on a dedicated or single-user cluster, Databricks enforces strict permission inheritance from the source table.

To perform any update/insert on a shallow cloned table, you must have:

  • USE permission on the source catalog and schema

  • SELECT permission on the source table

  • MODIFY permission on the source table

Even if you’re modifying the cloned table, the underlying Delta metadata and lineage trace back to the original table, and Unity Catalog enforces this by default. Docs_dedicated

In standard mode works because it follows a different privileges system Docs_standard 

Hope this helps, 🙂

Isi

der
New Contributor III

@Isi Thank you for the link to the documentation. I did not find it!

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now