05-27-2025 04:07 AM
If I want to modify a shallow cloned table with partitionOverwriteMode dynamic on a "dedicated/single user" cluster DBR 16.4 i get following error message:
Py4JJavaError: An error occurred while calling o483.saveAsTable.
: org.apache.spark.SparkException: [TASK_WRITE_FAILED] Task failed while writing rows to abfss://<TABLE_STORAGE_PLACE>. SQLSTATE: 58030
at org.apache.spark.sql.errors.QueryExecutionErrors$.taskFailedWhileWritingRowsError(QueryExecutionErrors.scala:996)
at org.apache.spark.sql.execution.datasources.FileFormatDataWriter.enrichWriteError(FileFormatDataWriter.scala:109)
at org.apache.spark.sql.execution.datasources.FileFormatDataWriter.writeWithMetrics(FileFormatDataWriter.scala:120)
at org.apache.spark.sql.execution.datasources.FileFormatDataWriter.writeWithIterator(FileFormatDataWriter.scala:128)
at org.apache.spark.sql.execution.datasources.FileFormatWriter$.$anonfun$executeTask$1(FileFormatWriter.scala:559)
at org.apache.spark.util.Utils$.tryWithSafeFinallyAndFailureCallbacks(Utils.scala:1628)
at org.apache.spark.sql.execution.datasources.FileFormatWriter$.executeTask(FileFormatWriter.scala:566)
at org.apache.spark.sql.execution.datasources.WriteFilesExec.$anonfun$doExecuteWrite$1(WriteFiles.scala:125)
at org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2(RDD.scala:938)
at org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2$adapted(RDD.scala:938)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:60)
at org.apache.spark.rdd.RDD.$anonfun$computeOrReadCheckpoint$1(RDD.scala:413)
at com.databricks.spark.util.ExecutorFrameProfiler$.record(ExecutorFrameProfiler.scala:110)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:410)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:377)
at org.apache.spark.scheduler.ResultTask.$anonfun$runTask$3(ResultTask.scala:82)
at com.databricks.spark.util.ExecutorFrameProfiler$.record(ExecutorFrameProfiler.scala:110)
at org.apache.spark.scheduler.ResultTask.$anonfun$runTask$1(ResultTask.scala:82)
at com.databricks.spark.util.ExecutorFrameProfiler$.record(ExecutorFrameProfiler.scala:110)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:62)
at org.apache.spark.TaskContext.runTaskWithListeners(TaskContext.scala:225)
at org.apache.spark.scheduler.Task.doRunTask(Task.scala:199)
at org.apache.spark.scheduler.Task.$anonfun$run$5(Task.scala:161)
at com.databricks.unity.UCSEphemeralState$Handle.runWith(UCSEphemeralState.scala:51)
at com.databricks.unity.HandleImpl.runWith(UCSHandle.scala:104)
at com.databricks.unity.HandleImpl.$anonfun$runWithAndClose$1(UCSHandle.scala:109)
at scala.util.Using$.resource(Using.scala:269)
at com.databricks.unity.HandleImpl.runWithAndClose(UCSHandle.scala:108)
at org.apache.spark.scheduler.Task.$anonfun$run$1(Task.scala:155)
at com.databricks.spark.util.ExecutorFrameProfiler$.record(ExecutorFrameProfiler.scala:110)
at org.apache.spark.scheduler.Task.run(Task.scala:102)
at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$10(Executor.scala:1043)
at org.apache.spark.util.SparkErrorUtils.tryWithSafeFinally(SparkErrorUtils.scala:64)
at org.apache.spark.util.SparkErrorUtils.tryWithSafeFinally$(SparkErrorUtils.scala:61)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:111)
at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:1046)
at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
at com.databricks.spark.util.ExecutorFrameProfiler$.record(ExecutorFrameProfiler.scala:110)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:933)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
at java.base/java.lang.Thread.run(Thread.java:840)
at org.apache.spark.scheduler.DAGScheduler.$anonfun$runJob$1(DAGScheduler.scala:1413)
at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
at com.databricks.spark.util.FrameProfiler$.record(FrameProfiler.scala:94)
at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:1401)
at org.apache.spark.SparkContext.runJobInternal(SparkContext.scala:3171)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:3152)
at org.apache.spark.sql.execution.datasources.FileFormatWriter$.$anonfun$executeWrite$6(FileFormatWriter.scala:435)
at org.apache.spark.sql.catalyst.MetricKeyUtils$.measureMs(MetricKey.scala:1195)
at org.apache.spark.sql.execution.datasources.FileFormatWriter$.$anonfun$executeWrite$5(FileFormatWriter.scala:433)
at org.apache.spark.sql.execution.datasources.FileFormatWriter$.writeAndCommit(FileFormatWriter.scala:395)
at org.apache.spark.sql.execution.datasources.FileFormatWriter$.executeWrite(FileFormatWriter.scala:431)
at org.apache.spark.sql.execution.datasources.FileFormatWriter$.$anonfun$write$1(FileFormatWriter.scala:300)
at com.databricks.spark.util.FrameProfiler$.record(FrameProfiler.scala:94)
at org.apache.spark.sql.execution.datasources.FileFormatWriter$.write(FileFormatWriter.scala:121)
at com.databricks.sql.transaction.tahoe.commands.WriteIntoDeltaCommand.run(WriteIntoDeltaCommand.scala:121)
at org.apache.spark.sql.execution.command.DataWritingCommandExec.$anonfun$sideEffectResult$5(commands.scala:137)
at org.apache.spark.sql.execution.SparkPlan.runCommandInAetherOrSpark(SparkPlan.scala:189)
at org.apache.spark.sql.execution.command.DataWritingCommandExec.$anonfun$sideEffectResult$4(commands.scala:137)
at com.databricks.spark.util.FrameProfiler$.record(FrameProfiler.scala:94)
at org.apache.spark.sql.execution.command.DataWritingCommandExec.sideEffectResult$lzycompute(commands.scala:133)
at org.apache.spark.sql.execution.command.DataWritingCommandExec.sideEffectResult(commands.scala:132)
at org.apache.spark.sql.execution.command.DataWritingCommandExec.$anonfun$doExecute$4(commands.scala:161)
at com.databricks.spark.util.FrameProfiler$.record(FrameProfiler.scala:94)
at org.apache.spark.sql.execution.command.DataWritingCommandExec.doExecute(commands.scala:161)
at org.apache.spark.sql.execution.SparkPlan.$anonfun$execute$2(SparkPlan.scala:341)
at com.databricks.spark.util.FrameProfiler$.record(FrameProfiler.scala:94)
at org.apache.spark.sql.execution.SparkPlan.$anonfun$execute$1(SparkPlan.scala:341)
at org.apache.spark.sql.execution.SparkPlan$.org$apache$spark$sql$execution$SparkPlan$$withExecuteQueryLogging(SparkPlan.scala:132)
at org.apache.spark.sql.execution.SparkPlan.$anonfun$executeQuery$1(SparkPlan.scala:399)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:165)
at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:395)
at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:336)
at org.apache.spark.sql.execution.adaptive.AdaptiveSparkPlanExec.$anonfun$doExecute$1(AdaptiveSparkPlanExec.scala:981)
at org.apache.spark.sql.execution.adaptive.ResultQueryStageExec.$anonfun$doMaterialize$1(QueryStageExec.scala:663)
at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:1210)
at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withThreadLocalCaptured$7(SQLExecution.scala:905)
at com.databricks.util.LexicalThreadLocal$Handle.runWith(LexicalThreadLocal.scala:63)
at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withThreadLocalCaptured$6(SQLExecution.scala:905)
at com.databricks.util.LexicalThreadLocal$Handle.runWith(LexicalThreadLocal.scala:63)
at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withThreadLocalCaptured$5(SQLExecution.scala:905)
at scala.util.DynamicVariable.withValue(DynamicVariable.scala:62)
at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withThreadLocalCaptured$4(SQLExecution.scala:904)
at scala.util.DynamicVariable.withValue(DynamicVariable.scala:62)
at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withThreadLocalCaptured$3(SQLExecution.scala:903)
at com.databricks.sql.transaction.tahoe.OptimisticTransaction$.withActive(OptimisticTransaction.scala:216)
at com.databricks.sql.transaction.tahoe.ConcurrencyHelpers$.withOptimisticTransaction(ConcurrencyHelpers.scala:54)
at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withThreadLocalCaptured$2(SQLExecution.scala:902)
at org.apache.spark.JobArtifactSet$.withActiveJobArtifactState(JobArtifactSet.scala:97)
at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withThreadLocalCaptured$1(SQLExecution.scala:886)
at java.base/java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1768)
at org.apache.spark.util.threads.SparkThreadLocalCapturingRunnable.$anonfun$run$1(SparkThreadLocalForwardingThreadPoolExecutor.scala:157)
at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
at com.databricks.spark.util.IdentityClaim$.withClaim(IdentityClaim.scala:48)
at org.apache.spark.util.threads.SparkThreadLocalCapturingHelper.$anonfun$runWithCaptured$4(SparkThreadLocalForwardingThreadPoolExecutor.scala:113)
at com.databricks.unity.UCSEphemeralState$Handle.runWith(UCSEphemeralState.scala:51)
at org.apache.spark.util.threads.SparkThreadLocalCapturingHelper.runWithCaptured(SparkThreadLocalForwardingThreadPoolExecutor.scala:112)
at org.apache.spark.util.threads.SparkThreadLocalCapturingHelper.runWithCaptured$(SparkThreadLocalForwardingThreadPoolExecutor.scala:89)
at org.apache.spark.util.threads.SparkThreadLocalCapturingRunnable.runWithCaptured(SparkThreadLocalForwardingThreadPoolExecutor.scala:154)
at org.apache.spark.util.threads.SparkThreadLocalCapturingRunnable.run(SparkThreadLocalForwardingThreadPoolExecutor.scala:157)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
at java.base/java.lang.Thread.run(Thread.java:840)
Caused by: com.databricks.sql.managedcatalog.acl.UnauthorizedAccessException: PERMISSION_DENIED: User does not have MODIFY on Table 'catalog1.schema1.product_sales'.
at com.databricks.managedcatalog.UCReliableHttpClient.reliablyAndTranslateExceptions(UCReliableHttpClient.scala:152)
at com.databricks.managedcatalog.UCReliableHttpClient.postJsonWithOptions(UCReliableHttpClient.scala:190)
at com.databricks.managedcatalog.ManagedCatalogClientImpl.generateTemporaryTableCredentials(ManagedCatalogClientImpl.scala:3463)
at com.databricks.managedcatalog.ManagedCatalogClientImpl.$anonfun$getTableCredentials$1(ManagedCatalogClientImpl.scala:3521)
at com.databricks.managedcatalog.ManagedCatalogClientImpl.$anonfun$recordAndWrapException$2(ManagedCatalogClientImpl.scala:6873)
at com.databricks.spark.util.FrameProfiler$.record(FrameProfiler.scala:94)
at com.databricks.managedcatalog.ManagedCatalogClientImpl.$anonfun$recordAndWrapException$1(ManagedCatalogClientImpl.scala:6872)
at com.databricks.managedcatalog.ErrorDetailsHandler.wrapServiceException(ErrorDetailsHandler.scala:37)
at com.databricks.managedcatalog.ErrorDetailsHandler.wrapServiceException$(ErrorDetailsHandler.scala:35)
at com.databricks.managedcatalog.ManagedCatalogClientImpl.wrapServiceException(ManagedCatalogClientImpl.scala:216)
at com.databricks.managedcatalog.ManagedCatalogClientImpl.recordAndWrapException(ManagedCatalogClientImpl.scala:6853)
at com.databricks.managedcatalog.ManagedCatalogClientImpl.getTableCredentials(ManagedCatalogClientImpl.scala:3503)
at com.databricks.sql.managedcatalog.ManagedCatalogClient.getTemporaryCredentials(ManagedCatalogClient.scala:2398)
at com.databricks.sql.managedcatalog.ManagedCatalogClient.getTemporaryCredentials$(ManagedCatalogClient.scala:2383)
at com.databricks.managedcatalog.ManagedCatalogClientImpl.getTemporaryCredentials(ManagedCatalogClientImpl.scala:216)
at com.databricks.unity.TempCredCache.$anonfun$getInternal$7(TemporaryCredentials.scala:392)
at com.google.common.cache.LocalCache$LocalManualCache$1.load(LocalCache.java:4724)
at com.google.common.cache.LocalCache$LoadingValueReference.loadFuture(LocalCache.java:3522)
at com.google.common.cache.LocalCache$Segment.loadSync(LocalCache.java:2315)
at com.google.common.cache.LocalCache$Segment.lockedGetOrLoad(LocalCache.java:2278)
at com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2193)
at com.google.common.cache.LocalCache.get(LocalCache.java:3932)
at com.google.common.cache.LocalCache$LocalManualCache.get(LocalCache.java:4721)
at com.databricks.unity.TempCredCache.liftedTree1$1(TemporaryCredentials.scala:391)
at com.databricks.unity.TempCredCache.getInternal(TemporaryCredentials.scala:390)
at com.databricks.unity.TempCredCache.get(TemporaryCredentials.scala:319)
at com.databricks.unity.UnityCredentialManager.getTemporaryCredentials(CredentialManager.scala:471)
at com.databricks.unity.CredentialManager$.getTemporaryCredentials(CredentialManager.scala:849)
at com.databricks.unity.CredentialManagerRpcHelper.$anonfun$getTemporaryCredentials$1(UCSDriver.scala:280)
at com.databricks.unity.UCSEphemeralState$Handle.runWith(UCSEphemeralState.scala:51)
at com.databricks.unity.HandleImpl.runWith(UCSHandle.scala:104)
at com.databricks.unity.HandleImpl.$anonfun$runWithAndClose$1(UCSHandle.scala:109)
at scala.util.Using$.resource(Using.scala:269)
at com.databricks.unity.HandleImpl.runWithAndClose(UCSHandle.scala:108)
at com.databricks.unity.CredentialManagerRpcHelper.runWithScopeAndClose(UCSDriver.scala:254)
at com.databricks.unity.CredentialManagerRpcHelper.runWithScopeAndClose$(UCSDriver.scala:251)
at com.databricks.unity.CredentialManagerRpcHelper$.runWithScopeAndClose(UCSDriver.scala:285)
at com.databricks.unity.CredentialManagerRpcHelper.getTemporaryCredentials(UCSDriver.scala:280)
at com.databricks.unity.CredentialManagerRpcHelper.getTemporaryCredentials$(UCSDriver.scala:278)
at com.databricks.unity.CredentialManagerRpcHelper$.getTemporaryCredentials(UCSDriver.scala:285)
at org.apache.spark.unity.CredentialRpcEndpoint$$anonfun$receiveAndReply$1.applyOrElse(CredentialRpcEndpoint.scala:45)
at org.apache.spark.rpc.netty.Inbox.$anonfun$process$1(Inbox.scala:104)
at org.apache.spark.rpc.netty.Inbox.safelyCall(Inbox.scala:216)
at org.apache.spark.rpc.netty.Inbox.process(Inbox.scala:101)
at org.apache.spark.rpc.netty.MessageLoop.org$apache$spark$rpc$netty$MessageLoop$$receiveLoop(MessageLoop.scala:76)
at org.apache.spark.rpc.netty.MessageLoop$$anon$1.run(MessageLoop.scala:42)
... 12 more
I have full access on the cloned table, but only select rights on the source table.
On a "Standard/Shared" cluster it works without error! It also works, if the clone is deep for obvious reasons
Example script:
from pyspark.sql import SparkSession
from pyspark.sql.types import StructType, StructField, StringType, IntegerType
# Clean up and setup
partitioned_table = f"catalog1.schema1.product_sales"
cloned_table = f"catalog2.schema2.product_sales_clone"
spark.sql(f"DROP TABLE IF EXISTS {partitioned_table}")
spark.sql(f"DROP TABLE IF EXISTS {cloned_table}")
# 1. Create dummy DataFrame
schema_def = StructType([
StructField("region", StringType(), True),
StructField("product", StringType(), True),
StructField("revenue", IntegerType(), True)
])
data = [
("North", "Widgets", 300),
("North", "Gadgets", 200),
("South", "Widgets", 150),
("South", "Gadgets", 100),
]
df = spark.createDataFrame(data, schema=schema_def)
# 2. Write and revoke MODIFY rights
(df.write.format("delta")
.partitionBy("region")
.mode("overwrite")
.saveAsTable(partitioned_table))
# TODO: how you gave rights (catalog, schema, table)
# catalog_name = partitioned_table.split(".")[0]
# schema_name = partitioned_table.split(".")[1]
# spark.sql(f"REVOKE MODIFY ON TABLE {partitioned_table} FROM CURRENT_USER()")
# spark.sql(f"REVOKE MODIFY ON SCHEMA {catalog_name}.{schema_name} FROM CURRENT_USER()")
# spark.sql(f"REVOKE MODIFY ON CATALOG {catalog_name} FROM CURRENT_USER()")
# 3. Create a shallow clone
spark.sql(f"""
CREATE OR REPLACE TABLE {cloned_table}
SHALLOW CLONE {partitioned_table}
""")
# 4. Create new dummy data for a single partition
new_data = [
("North", "Widgets", 999),
("North", "Gadgets", 888)
]
new_df = spark.createDataFrame(new_data, schema=schema_def)
# 5. Overwrite only the 'North' partition of the cloned table dynamically
(new_df.write.format("delta")
.partitionBy("region")
.mode("overwrite")
.option("partitionOverwriteMode", "dynamic")
.saveAsTable(cloned_table))
05-28-2025 02:37 PM
Hey @der
"I have full access on the cloned table, but only select rights on the source table."
When working with shallow clones in Unity Catalog on a dedicated or single-user cluster, Databricks enforces strict permission inheritance from the source table.
To perform any update/insert on a shallow cloned table, you must have:
USE permission on the source catalog and schema
SELECT permission on the source table
MODIFY permission on the source table
Even if you’re modifying the cloned table, the underlying Delta metadata and lineage trace back to the original table, and Unity Catalog enforces this by default. Docs_dedicated
In standard mode works because it follows a different privileges system Docs_standard
Hope this helps, 🙂
Isi
05-28-2025 02:37 PM
Hey @der
"I have full access on the cloned table, but only select rights on the source table."
When working with shallow clones in Unity Catalog on a dedicated or single-user cluster, Databricks enforces strict permission inheritance from the source table.
To perform any update/insert on a shallow cloned table, you must have:
USE permission on the source catalog and schema
SELECT permission on the source table
MODIFY permission on the source table
Even if you’re modifying the cloned table, the underlying Delta metadata and lineage trace back to the original table, and Unity Catalog enforces this by default. Docs_dedicated
In standard mode works because it follows a different privileges system Docs_standard
Hope this helps, 🙂
Isi
06-02-2025 06:50 AM
@Isi Thank you for the link to the documentation. I did not find it!
Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!
Sign Up Now