Copy Local file using a Shared Cluster
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
02-13-2024 09:26 AM - edited 02-13-2024 09:27 AM
Hi,
I am saving some files locally on my cluster and moving them after my job. These are log files of my process so I cant directly reference a DBFS location.
However the dbutils.fs.cp command does not work on the shared cluster. This does however work on a individual cluster. I believe this is related to how the clusters are split amongst users.
File location: "/home/spark-daed4064-233f-446c-b9f2-5b/log.txt''
Copy command:
import os
#path gets set to /home/spark-daed4064-233f-446c-b9f2-5b/
path = os.getcwd()
new_path = f"{path}/logs.txt"
# output printed out -> /home/spark-4c17311c-654a-4c71-b551-2e/logs.txt
print(new_path)
dbutils.fs.cp(new_path, "dbfs:/databricks/scripts/logs.txt")
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
02-13-2024 09:34 AM
For reference when doing this on a single user (personal) cluster - the file is store in:
/databricks/driver/logs.txt
Which has no issue accessing and copying to dbfs after using the dbutil commands
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
02-14-2024 08:25 AM
Hi @Retired_mod - thanks for mentioning that. The issue is accessing the local file on the cluster not the dbfs location.
But it is still like you said a cluster config issue:
org.apache.spark.api.python.PythonSecurityException: Path 'file:/home/spark-c989284b-a795-4ca0-858e-84/logs.txt' uses an untrusted filesystem 'com.databricks.backend.daemon.driver.WorkspaceLocalFileSystem'

