โ09-20-2023 02:12 AM
I have an external location setup "auth_kafka" which is mapped to an abfss url:
and, critically, is marked as readonly.
Using dbutils.fs I can successfully read the files (i.e. the ls and head function calls to files in that location all work), but I cannot run dbutils.fs.cp to copy files from there to dbfs, as follows:
This results in the following error (PERMISSION_DENIED: User does not have WRITE FILES on External Location 'auth_kafka'.)
ExecutionError: An error occurred while calling o548.cp.
: java.io.IOException: Server-side copy has failed. Please try disabling it through `databricks.spark.dbutils.fs.cp.server-side.enabled`
at com.databricks.backend.daemon.dbutils.FSUtils.cpRecursive(DBUtilsCore.scala:400)
at com.databricks.backend.daemon.dbutils.FSUtils.$anonfun$cp$3(DBUtilsCore.scala:336)
at com.databricks.backend.daemon.dbutils.FSUtils.$anonfun$withCpSafetyChecks$2(DBUtilsCore.scala:160)
at com.databricks.backend.daemon.dbutils.FSUtils.withFsSafetyCheck(DBUtilsCore.scala:145)
at com.databricks.backend.daemon.dbutils.FSUtils.$anonfun$withCpSafetyChecks$1(DBUtilsCore.scala:152)
at com.databricks.backend.daemon.dbutils.FSUtils.withFsSafetyCheck(DBUtilsCore.scala:145)
at com.databricks.backend.daemon.dbutils.FSUtils.withCpSafetyChecks(DBUtilsCore.scala:152)
at com.databricks.backend.daemon.dbutils.FSUtils.$anonfun$cp$2(DBUtilsCore.scala:333)
at com.databricks.backend.daemon.dbutils.FSUtils.checkPermission(DBUtilsCore.scala:140)
at com.databricks.backend.daemon.dbutils.FSUtils.$anonfun$cp$1(DBUtilsCore.scala:333)
at com.databricks.logging.UsageLogging.$anonfun$recordOperation$1(UsageLogging.scala:571)
at com.databricks.logging.UsageLogging.executeThunkAndCaptureResultTags$1(UsageLogging.scala:666)
at com.databricks.logging.UsageLogging.$anonfun$recordOperationWithResultTags$4(UsageLogging.scala:684)
at com.databricks.logging.UsageLogging.$anonfun$withAttributionContext$1(UsageLogging.scala:426)
at scala.util.DynamicVariable.withValue(DynamicVariable.scala:62)
at com.databricks.logging.AttributionContext$.withValue(AttributionContext.scala:196)
at com.databricks.logging.UsageLogging.withAttributionContext(UsageLogging.scala:424)
at com.databricks.logging.UsageLogging.withAttributionContext$(UsageLogging.scala:418)
at com.databricks.backend.daemon.dbutils.FSUtils.withAttributionContext(DBUtilsCore.scala:69)
at com.databricks.logging.UsageLogging.withAttributionTags(UsageLogging.scala:470)
at com.databricks.logging.UsageLogging.withAttributionTags$(UsageLogging.scala:455)
at com.databricks.backend.daemon.dbutils.FSUtils.withAttributionTags(DBUtilsCore.scala:69)
at com.databricks.logging.UsageLogging.recordOperationWithResultTags(UsageLogging.scala:661)
at com.databricks.logging.UsageLogging.recordOperationWithResultTags$(UsageLogging.scala:580)
at com.databricks.backend.daemon.dbutils.FSUtils.recordOperationWithResultTags(DBUtilsCore.scala:69)
at com.databricks.logging.UsageLogging.recordOperation(UsageLogging.scala:571)
at com.databricks.logging.UsageLogging.recordOperation$(UsageLogging.scala:540)
at com.databricks.backend.daemon.dbutils.FSUtils.recordOperation(DBUtilsCore.scala:69)
at com.databricks.backend.daemon.dbutils.FSUtils.recordDbutilsFsOp(DBUtilsCore.scala:133)
at com.databricks.backend.daemon.dbutils.FSUtils.cp(DBUtilsCore.scala:332)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:397)
at py4j.Gateway.invoke(Gateway.java:306)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.ClientServerConnection.waitForCommands(ClientServerConnection.java:195)
at py4j.ClientServerConnection.run(ClientServerConnection.java:115)
at java.lang.Thread.run(Thread.java:750)
Caused by: com.databricks.sql.managedcatalog.acl.UnauthorizedAccessException: PERMISSION_DENIED: User does not have WRITE FILES on External Location 'auth_kafka'.
at com.databricks.managedcatalog.UCReliableHttpClient.reliablyAndTranslateExceptions(UCReliableHttpClient.scala:47)
at com.databricks.managedcatalog.UCReliableHttpClient.postJson(UCReliableHttpClient.scala:63)
at com.databricks.managedcatalog.ManagedCatalogClientImpl.$anonfun$generateTemporaryPathCredentials$1(ManagedCatalogClientImpl.scala:3262)
at com.databricks.managedcatalog.ManagedCatalogClientImpl.$anonfun$recordAndWrapException$2(ManagedCatalogClientImpl.scala:3674)
at com.databricks.spark.util.FrameProfiler$.record(FrameProfiler.scala:94)
at com.databricks.managedcatalog.ManagedCatalogClientImpl.$anonfun$recordAndWrapException$1(ManagedCatalogClientImpl.scala:3673)
at com.databricks.managedcatalog.ErrorDetailsHandler.wrapServiceException(ErrorDetailsHandler.scala:25)
at com.databricks.managedcatalog.ErrorDetailsHandler.wrapServiceException$(ErrorDetailsHandler.scala:23)
at com.databricks.managedcatalog.ManagedCatalogClientImpl.wrapServiceException(ManagedCatalogClientImpl.scala:139)
at com.databricks.managedcatalog.ManagedCatalogClientImpl.recordAndWrapException(ManagedCatalogClientImpl.scala:3670)
at com.databricks.managedcatalog.ManagedCatalogClientImpl.generateTemporaryPathCredentials(ManagedCatalogClientImpl.scala:3253)
at com.databricks.sql.managedcatalog.ManagedCatalogCommon.generateTemporaryPathCredentials(ManagedCatalogCommon.scala:1456)
at com.databricks.sql.managedcatalog.ProfiledManagedCatalog.$anonfun$generateTemporaryPathCredentials$2(ProfiledManagedCatalog.scala:564)
at org.apache.spark.sql.catalyst.MetricKeyUtils$.measure(MetricKey.scala:319)
at com.databricks.sql.managedcatalog.ProfiledManagedCatalog.$anonfun$profile$1(ProfiledManagedCatalog.scala:55)
at com.databricks.spark.util.FrameProfiler$.record(FrameProfiler.scala:94)
at com.databricks.sql.managedcatalog.ProfiledManagedCatalog.profile(ProfiledManagedCatalog.scala:54)
at com.databricks.sql.managedcatalog.ProfiledManagedCatalog.generateTemporaryPathCredentials(ProfiledManagedCatalog.scala:564)
at com.databricks.unity.CredentialScopeSQLHelper$.checkPathOperations(CredentialScopeSQLHelper.scala:95)
at com.databricks.unity.CredentialScopeSQLHelper$.registerExternalLocationPath(CredentialScopeSQLHelper.scala:197)
at com.databricks.unity.CredentialScopeSQLHelper$.register(CredentialScopeSQLHelper.scala:154)
at com.databricks.unity.CredentialScopeSQLHelper$.registerPathAccess(CredentialScopeSQLHelper.scala:443)
at com.databricks.backend.daemon.dbutils.ExternalLocationHelper$.$anonfun$registerPaths$1(ExternalLocationHelper.scala:48)
at scala.collection.IndexedSeqOptimized.foreach(IndexedSeqOptimized.scala:36)
at scala.collection.IndexedSeqOptimized.foreach$(IndexedSeqOptimized.scala:33)
at scala.collection.mutable.WrappedArray.foreach(WrappedArray.scala:38)
at com.databricks.backend.daemon.dbutils.ExternalLocationHelper$.registerPaths(ExternalLocationHelper.scala:41)
at com.databricks.backend.daemon.dbutils.FSUtils.$anonfun$cp$3(DBUtilsCore.scala:334)
... 40 more
That particilar error relates to the user not having write permissions on the external location...I can add a grant to give the user WRITE FILES on it, but it just defers the issue to the next tier, resulting in the error:
Caused by: com.databricks.sql.managedcatalog.acl.UnauthorizedAccessException: PERMISSION_DENIED: User cannot write to a read-only external location auth_kafka
since the external location is flagged as readonly.
This is weird to me - why is write access needed to copy from the abfss location? Surely it only needs to read it? I can confirm that opening up the permissions on the external location to allow writes resolves the issue...but that kinda defeats the purpose?
โ09-20-2023 04:35 AM
Hi @mwoods , In Python, you can use the pickle
module to serialize and de-serialize Python object structures. You can save your variables to a file pickle.dump()
and then load them in another notebook using pickle.load()
.
Here's how you can do it:
In notebook1.ipynb
:
python
import pickle
var1 = "dallas"
var_lst = [100, 200, 300, 400, 500]
with open('variables.pkl', 'wb') as f:
pickle.dump([var1, var_lst], f)
Then, in notebook2.ipynb
:
python
import pickle
with open('variables.pkl', 'rb') as f:
var1, var_lst = pickle.load(f)
print(var1)
print(var_lst)
In the above code, pickle.dump()
is used to write the serialized representation of the variables var1
and var_lst
to the file variables.pkl
. pickle.load()
is then used to load these variables back into the memory in notebook2.ipynb
.
Please note that the pickle
module is not secure against erroneous or maliciously constructed data. Never unpickle data received from an untrusted or unauthenticated source. Unfortunately, the provided sources do not contain relevant information to answer this question.
โ09-20-2023 06:07 AM - edited โ09-20-2023 06:08 AM
@Kaniz Sorry...not sure how that's relevant? Was that posted to the wrong topic?
This question is in regards to what appears to be a bug in dbutils.fs where the cp function appears to require write access to the data source (as opposed to just read access), i.e. write access should only be necessary on the destination.
Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections.
Click here to register and join today!
Engage in exciting technical discussions, join a group with your peers and meet our Featured Members.