cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
cancel
Showing results for 
Search instead for 
Did you mean: 

DBFS and Local File System Doubts

kinsun
New Contributor II

Dear Databricks Expert,

I got some doubts when dealing with DBFS and Local File System.

Case01: Copy a file from ADLS to DBFS. I am able to do so through the below python codes:

#

spark.conf.set("fs.azure.account.auth.type", "OAuth") 

spark.conf.set("fs.azure.account.oauth.provider.type", "org.apache.hadoop.fs.azurebfs.oauth2.ClientCredsTokenProvider")

spark.conf.set("fs.azure.account.oauth2.client.id", "***") 

spark.conf.set("fs.azure.account.oauth2.client.secret", "YYY")

spark.conf.set("fs.azure.account.oauth2.client.endpoint", "https://login.microsoftonline.com/99e838ef-4ec1-4ce2-9229-2efbb56fb03c/oauth2/token")

#

abfssFile = 'abfss://AAA@BBB.dfs.core.windows.net/data/file.csv'

dbfsFile = 'dbfs:/workfile/file.csv'

dbutils.fs.cp(abfssFile, dbfsFile)

Case02: Read the file which is now in the DBFS.

with open(dbfsFile, "r",newline='') as csv_file:

  input_data = csv_file.read()  

  print (input_data)

Error:

FileNotFoundError: [Errno 2] No such file or directory: 'dbfs:/workfile/file.csv'

Later I realized that the open() command is to deal with files in the Local File System. So I tried to copy the file from DBFS to the Local File System

Case03: Copy a file from DBFS to the Local File System

localFile = 'file:///tmp/wf_lfs.csv'

dbutils.fs.cp(dbfsFile, localFile)

Error:

Java.lang.SecurityException: Cannot use com.databricks.backend.daemon.driver.WorkspaceLocalFileSystem - local filesystem access is forbidden

Question 1

Local File System. I assume that it is the file system in the Spark Driver Node. Is my understanding correct?

Question 2

Is there any way to read a file directly in the DBFS? If not is it because DBFS is a distributed file system?

Question 3

How to copy a file from DBFS to the Local File System?

Thanks a lot for your help in advance!

1 ACCEPTED SOLUTION

Accepted Solutions

madhav_dhruve
New Contributor III

HI, I was able to solve the issue using single user cluster. It has necessary priviliges to access local file system. If you want to use it with shared cluster, you need an init script giving permissions to a new folder created by you inside temp directory.

Example - 

#!/bin/bash
sudo iptables -A INPUT -s 169.254.169.254 -j ACCEPT
sudo iptables -A OUTPUT -d 169.254.169.254 -j ACCEPT
mkdir -p /tmp/myfiles
chmod -R 0777 /tmp/myfiles

View solution in original post

5 REPLIES 5

Anonymous
Not applicable

@KS LAU​ :

Answer 1:

Yes, you are correct. The local file system refers to the file system on the Spark driver node. It is the file system where the Spark application is running and where the application can read and write files.

Answer 2:

Yes, you can read a file directly from DBFS. You can use the Databricks File System (DBFS) API to read files from DBFS. You can also use the dbutils.fs.head command to preview the first n bytes of a file in DBFS. Here is an example:

dbfs_file = "/mnt/data/myfile.csv"
dbutils.fs.head(dbfs_file, 100)

This will preview the first 100 bytes of the file /mnt/data/myfile.csv in DBFS.

Answer 3:

To copy a file from DBFS to the local file system, you can use the dbutils.fs.cp command with the

file:/ schema to specify the local file system. Here is an example:

dbfs_file = "/mnt/data/myfile.csv"
local_file = "file:///tmp/myfile.csv"
dbutils.fs.cp(dbfs_file, local_file)

This will copy the file /mnt/data/myfile.csv in DBFS to /tmp/myfile.csv in the local file system. Note that you need to have write permission on the local file system to write the file. Also, make sure that the file:/// schema is used to specify the local file system.

Hi,

I need to uncompress files in S3 so need to copy files to local file system.

When i use dbutils.fs.cp(dbfs_file, local_file) with dbfs file as s3://path_to_file or dbfs://path_to_file and local_file as file:///tmp/path_to_file, I am getting error as below - 

ERROR - java.lang.SecurityException: Cannot use com.databricks.backend.daemon.driver.WorkspaceLocalFileSystem - local filesystem access is forbidden

How to apply write permissions and resolve this issue.

 

 

Azde
New Contributor II

Hi madhav,

Did you able to resolve. Please post here and let us know. I am facing the same error.

madhav_dhruve
New Contributor III

HI, I was able to solve the issue using single user cluster. It has necessary priviliges to access local file system. If you want to use it with shared cluster, you need an init script giving permissions to a new folder created by you inside temp directory.

Example - 

#!/bin/bash
sudo iptables -A INPUT -s 169.254.169.254 -j ACCEPT
sudo iptables -A OUTPUT -d 169.254.169.254 -j ACCEPT
mkdir -p /tmp/myfiles
chmod -R 0777 /tmp/myfiles

Anonymous
Not applicable

Hi @KS LAU​ 

Thank you for posting your question in our community! We are happy to assist you.

To help us provide you with the most accurate information, could you please take a moment to review the responses and select the one that best answers your question?

This will also help other community members who may have similar questions in the future. Thank you for your participation and let us know if you need any further assistance! 

Welcome to Databricks Community: Lets learn, network and celebrate together

Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections. 

Click here to register and join today! 

Engage in exciting technical discussions, join a group with your peers and meet our Featured Members.