cancel
Showing results for 
Search instead for 
Did you mean: 

dbutils.fs.cp requires write permissions on the source

mwoods
New Contributor III

I have an external location setup "auth_kafka" which is mapped to an abfss url:

abfss://{container}@{account}.dfs.core.windows.net/auth/kafka

and, critically, is marked as readonly.

Using dbutils.fs I can successfully read the files (i.e. the ls and head function calls to files in that location all work), but I cannot run dbutils.fs.cp to copy files from there to dbfs, as follows:

dbutils.fs.cp("abfss://{container}@{account}.dfs.core.windows.net/auth/kafka/client.truststore.jks", "dbfs:/FileStore/Certs/client.truststore.jks")

This results in the following error (PERMISSION_DENIED: User does not have WRITE FILES on External Location 'auth_kafka'.)

ExecutionError: An error occurred while calling o548.cp.
: java.io.IOException: Server-side copy has failed. Please try disabling it through `databricks.spark.dbutils.fs.cp.server-side.enabled`
at com.databricks.backend.daemon.dbutils.FSUtils.cpRecursive(DBUtilsCore.scala:400)
at com.databricks.backend.daemon.dbutils.FSUtils.$anonfun$cp$3(DBUtilsCore.scala:336)
at com.databricks.backend.daemon.dbutils.FSUtils.$anonfun$withCpSafetyChecks$2(DBUtilsCore.scala:160)
at com.databricks.backend.daemon.dbutils.FSUtils.withFsSafetyCheck(DBUtilsCore.scala:145)
at com.databricks.backend.daemon.dbutils.FSUtils.$anonfun$withCpSafetyChecks$1(DBUtilsCore.scala:152)
at com.databricks.backend.daemon.dbutils.FSUtils.withFsSafetyCheck(DBUtilsCore.scala:145)
at com.databricks.backend.daemon.dbutils.FSUtils.withCpSafetyChecks(DBUtilsCore.scala:152)
at com.databricks.backend.daemon.dbutils.FSUtils.$anonfun$cp$2(DBUtilsCore.scala:333)
at com.databricks.backend.daemon.dbutils.FSUtils.checkPermission(DBUtilsCore.scala:140)
at com.databricks.backend.daemon.dbutils.FSUtils.$anonfun$cp$1(DBUtilsCore.scala:333)
at com.databricks.logging.UsageLogging.$anonfun$recordOperation$1(UsageLogging.scala:571)
at com.databricks.logging.UsageLogging.executeThunkAndCaptureResultTags$1(UsageLogging.scala:666)
at com.databricks.logging.UsageLogging.$anonfun$recordOperationWithResultTags$4(UsageLogging.scala:684)
at com.databricks.logging.UsageLogging.$anonfun$withAttributionContext$1(UsageLogging.scala:426)
at scala.util.DynamicVariable.withValue(DynamicVariable.scala:62)
at com.databricks.logging.AttributionContext$.withValue(AttributionContext.scala:196)
at com.databricks.logging.UsageLogging.withAttributionContext(UsageLogging.scala:424)
at com.databricks.logging.UsageLogging.withAttributionContext$(UsageLogging.scala:418)
at com.databricks.backend.daemon.dbutils.FSUtils.withAttributionContext(DBUtilsCore.scala:69)
at com.databricks.logging.UsageLogging.withAttributionTags(UsageLogging.scala:470)
at com.databricks.logging.UsageLogging.withAttributionTags$(UsageLogging.scala:455)
at com.databricks.backend.daemon.dbutils.FSUtils.withAttributionTags(DBUtilsCore.scala:69)
at com.databricks.logging.UsageLogging.recordOperationWithResultTags(UsageLogging.scala:661)
at com.databricks.logging.UsageLogging.recordOperationWithResultTags$(UsageLogging.scala:580)
at com.databricks.backend.daemon.dbutils.FSUtils.recordOperationWithResultTags(DBUtilsCore.scala:69)
at com.databricks.logging.UsageLogging.recordOperation(UsageLogging.scala:571)
at com.databricks.logging.UsageLogging.recordOperation$(UsageLogging.scala:540)
at com.databricks.backend.daemon.dbutils.FSUtils.recordOperation(DBUtilsCore.scala:69)
at com.databricks.backend.daemon.dbutils.FSUtils.recordDbutilsFsOp(DBUtilsCore.scala:133)
at com.databricks.backend.daemon.dbutils.FSUtils.cp(DBUtilsCore.scala:332)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:397)
at py4j.Gateway.invoke(Gateway.java:306)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.ClientServerConnection.waitForCommands(ClientServerConnection.java:195)
at py4j.ClientServerConnection.run(ClientServerConnection.java:115)
at java.lang.Thread.run(Thread.java:750)
Caused by: com.databricks.sql.managedcatalog.acl.UnauthorizedAccessException: PERMISSION_DENIED: User does not have WRITE FILES on External Location 'auth_kafka'.
at com.databricks.managedcatalog.UCReliableHttpClient.reliablyAndTranslateExceptions(UCReliableHttpClient.scala:47)
at com.databricks.managedcatalog.UCReliableHttpClient.postJson(UCReliableHttpClient.scala:63)
at com.databricks.managedcatalog.ManagedCatalogClientImpl.$anonfun$generateTemporaryPathCredentials$1(ManagedCatalogClientImpl.scala:3262)
at com.databricks.managedcatalog.ManagedCatalogClientImpl.$anonfun$recordAndWrapException$2(ManagedCatalogClientImpl.scala:3674)
at com.databricks.spark.util.FrameProfiler$.record(FrameProfiler.scala:94)
at com.databricks.managedcatalog.ManagedCatalogClientImpl.$anonfun$recordAndWrapException$1(ManagedCatalogClientImpl.scala:3673)
at com.databricks.managedcatalog.ErrorDetailsHandler.wrapServiceException(ErrorDetailsHandler.scala:25)
at com.databricks.managedcatalog.ErrorDetailsHandler.wrapServiceException$(ErrorDetailsHandler.scala:23)
at com.databricks.managedcatalog.ManagedCatalogClientImpl.wrapServiceException(ManagedCatalogClientImpl.scala:139)
at com.databricks.managedcatalog.ManagedCatalogClientImpl.recordAndWrapException(ManagedCatalogClientImpl.scala:3670)
at com.databricks.managedcatalog.ManagedCatalogClientImpl.generateTemporaryPathCredentials(ManagedCatalogClientImpl.scala:3253)
at com.databricks.sql.managedcatalog.ManagedCatalogCommon.generateTemporaryPathCredentials(ManagedCatalogCommon.scala:1456)
at com.databricks.sql.managedcatalog.ProfiledManagedCatalog.$anonfun$generateTemporaryPathCredentials$2(ProfiledManagedCatalog.scala:564)
at org.apache.spark.sql.catalyst.MetricKeyUtils$.measure(MetricKey.scala:319)
at com.databricks.sql.managedcatalog.ProfiledManagedCatalog.$anonfun$profile$1(ProfiledManagedCatalog.scala:55)
at com.databricks.spark.util.FrameProfiler$.record(FrameProfiler.scala:94)
at com.databricks.sql.managedcatalog.ProfiledManagedCatalog.profile(ProfiledManagedCatalog.scala:54)
at com.databricks.sql.managedcatalog.ProfiledManagedCatalog.generateTemporaryPathCredentials(ProfiledManagedCatalog.scala:564)
at com.databricks.unity.CredentialScopeSQLHelper$.checkPathOperations(CredentialScopeSQLHelper.scala:95)
at com.databricks.unity.CredentialScopeSQLHelper$.registerExternalLocationPath(CredentialScopeSQLHelper.scala:197)
at com.databricks.unity.CredentialScopeSQLHelper$.register(CredentialScopeSQLHelper.scala:154)
at com.databricks.unity.CredentialScopeSQLHelper$.registerPathAccess(CredentialScopeSQLHelper.scala:443)
at com.databricks.backend.daemon.dbutils.ExternalLocationHelper$.$anonfun$registerPaths$1(ExternalLocationHelper.scala:48)
at scala.collection.IndexedSeqOptimized.foreach(IndexedSeqOptimized.scala:36)
at scala.collection.IndexedSeqOptimized.foreach$(IndexedSeqOptimized.scala:33)
at scala.collection.mutable.WrappedArray.foreach(WrappedArray.scala:38)
at com.databricks.backend.daemon.dbutils.ExternalLocationHelper$.registerPaths(ExternalLocationHelper.scala:41)
at com.databricks.backend.daemon.dbutils.FSUtils.$anonfun$cp$3(DBUtilsCore.scala:334)
... 40 more

That particilar error relates to the user not having write permissions on the external location...I can add a grant to give the user WRITE FILES on it, but it just defers the issue to the next tier, resulting in the error:

Caused by: com.databricks.sql.managedcatalog.acl.UnauthorizedAccessException: PERMISSION_DENIED: User cannot write to a read-only external location auth_kafka

since the external location is flagged as readonly.

This is weird to me - why is write access needed to copy from the abfss location? Surely it only needs to read it? I can confirm that opening up the permissions on the external location to allow writes resolves the issue...but that kinda defeats the purpose?

2 REPLIES 2

Kaniz
Community Manager
Community Manager

Hi @mwoods , In Python, you can use the pickle module to serialize and de-serialize Python object structures. You can save your variables to a file  pickle.dump() and then load them in another notebook using pickle.load().

Here's how you can do it:

In notebook1.ipynb:

python
import pickle

var1 = "dallas"
var_lst = [100, 200, 300, 400, 500]

with open('variables.pkl', 'wb') as f:
   pickle.dump([var1, var_lst], f)

Then, in notebook2.ipynb:

python
import pickle

with open('variables.pkl', 'rb') as f:
   var1, var_lst = pickle.load(f)

print(var1)
print(var_lst)

In the above code, pickle.dump() is used to write the serialized representation of the variables var1 and var_lst to the file variables.pklpickle.load() is then used to load these variables back into the memory in notebook2.ipynb.

Please note that the pickle module is not secure against erroneous or maliciously constructed data. Never unpickle data received from an untrusted or unauthenticated source. Unfortunately, the provided sources do not contain relevant information to answer this question.

mwoods
New Contributor III

@Kaniz Sorry...not sure how that's relevant? Was that posted to the wrong topic?

This question is in regards to what appears to be a bug in dbutils.fs where the cp function appears to require write access to the data source (as opposed to just read access), i.e. write access should only be necessary on the destination.

Welcome to Databricks Community: Lets learn, network and celebrate together

Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections. 

Click here to register and join today! 

Engage in exciting technical discussions, join a group with your peers and meet our Featured Members.