04-27-2023 07:58 AM
Dear Databricks Expert,
I got some doubts when dealing with DBFS and Local File System.
Case01: Copy a file from ADLS to DBFS. I am able to do so through the below python codes:
#
spark.conf.set("fs.azure.account.auth.type", "OAuth")
spark.conf.set("fs.azure.account.oauth.provider.type", "org.apache.hadoop.fs.azurebfs.oauth2.ClientCredsTokenProvider")
spark.conf.set("fs.azure.account.oauth2.client.id", "***")
spark.conf.set("fs.azure.account.oauth2.client.secret", "YYY")
spark.conf.set("fs.azure.account.oauth2.client.endpoint", "https://login.microsoftonline.com/99e838ef-4ec1-4ce2-9229-2efbb56fb03c/oauth2/token")
#
abfssFile = 'abfss://AAA@BBB.dfs.core.windows.net/data/file.csv'
dbfsFile = 'dbfs:/workfile/file.csv'
dbutils.fs.cp(abfssFile, dbfsFile)
Case02: Read the file which is now in the DBFS.
with open(dbfsFile, "r",newline='') as csv_file:
input_data = csv_file.read()
print (input_data)
Error:
FileNotFoundError: [Errno 2] No such file or directory: 'dbfs:/workfile/file.csv'
Later I realized that the open() command is to deal with files in the Local File System. So I tried to copy the file from DBFS to the Local File System
Case03: Copy a file from DBFS to the Local File System
localFile = 'file:///tmp/wf_lfs.csv'
dbutils.fs.cp(dbfsFile, localFile)
Error:
Java.lang.SecurityException: Cannot use com.databricks.backend.daemon.driver.WorkspaceLocalFileSystem - local filesystem access is forbidden
Question 1
Local File System. I assume that it is the file system in the Spark Driver Node. Is my understanding correct?
Question 2
Is there any way to read a file directly in the DBFS? If not is it because DBFS is a distributed file system?
Question 3
How to copy a file from DBFS to the Local File System?
Thanks a lot for your help in advance!
09-03-2023 08:48 PM
HI, I was able to solve the issue using single user cluster. It has necessary priviliges to access local file system. If you want to use it with shared cluster, you need an init script giving permissions to a new folder created by you inside temp directory.
Example -
04-28-2023 10:24 AM
@KS LAU :
Answer 1:
Yes, you are correct. The local file system refers to the file system on the Spark driver node. It is the file system where the Spark application is running and where the application can read and write files.
Answer 2:
Yes, you can read a file directly from DBFS. You can use the Databricks File System (DBFS) API to read files from DBFS. You can also use the dbutils.fs.head command to preview the first n bytes of a file in DBFS. Here is an example:
dbfs_file = "/mnt/data/myfile.csv"
dbutils.fs.head(dbfs_file, 100)
This will preview the first 100 bytes of the file /mnt/data/myfile.csv in DBFS.
Answer 3:
To copy a file from DBFS to the local file system, you can use the dbutils.fs.cp command with the
file:/ schema to specify the local file system. Here is an example:
dbfs_file = "/mnt/data/myfile.csv"
local_file = "file:///tmp/myfile.csv"
dbutils.fs.cp(dbfs_file, local_file)
This will copy the file /mnt/data/myfile.csv in DBFS to /tmp/myfile.csv in the local file system. Note that you need to have write permission on the local file system to write the file. Also, make sure that the file:/// schema is used to specify the local file system.
07-17-2023 10:28 PM
Hi,
I need to uncompress files in S3 so need to copy files to local file system.
When i use dbutils.fs.cp(dbfs_file, local_file) with dbfs file as s3://path_to_file or dbfs://path_to_file and local_file as file:///tmp/path_to_file, I am getting error as below -
ERROR - java.lang.SecurityException: Cannot use com.databricks.backend.daemon.driver.WorkspaceLocalFileSystem - local filesystem access is forbidden
How to apply write permissions and resolve this issue.
09-02-2023 09:00 AM
Hi madhav,
Did you able to resolve. Please post here and let us know. I am facing the same error.
09-03-2023 08:48 PM
HI, I was able to solve the issue using single user cluster. It has necessary priviliges to access local file system. If you want to use it with shared cluster, you need an init script giving permissions to a new folder created by you inside temp directory.
Example -
04-28-2023 10:34 PM
Hi @KS LAU
Thank you for posting your question in our community! We are happy to assist you.
To help us provide you with the most accurate information, could you please take a moment to review the responses and select the one that best answers your question?
This will also help other community members who may have similar questions in the future. Thank you for your participation and let us know if you need any further assistance!
Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.
If there isn’t a group near you, start one and help create a community that brings people together.
Request a New Group