DBFS and Local File System Doubts

kinsun
New Contributor II

Dear Databricks Expert,

I got some doubts when dealing with DBFS and Local File System.

Case01: Copy a file from ADLS to DBFS. I am able to do so through the below python codes:

#

spark.conf.set("fs.azure.account.auth.type", "OAuth") 

spark.conf.set("fs.azure.account.oauth.provider.type", "org.apache.hadoop.fs.azurebfs.oauth2.ClientCredsTokenProvider")

spark.conf.set("fs.azure.account.oauth2.client.id", "***") 

spark.conf.set("fs.azure.account.oauth2.client.secret", "YYY")

spark.conf.set("fs.azure.account.oauth2.client.endpoint", "https://login.microsoftonline.com/99e838ef-4ec1-4ce2-9229-2efbb56fb03c/oauth2/token")

#

abfssFile = 'abfss://AAA@BBB.dfs.core.windows.net/data/file.csv'

dbfsFile = 'dbfs:/workfile/file.csv'

dbutils.fs.cp(abfssFile, dbfsFile)

Case02: Read the file which is now in the DBFS.

with open(dbfsFile, "r",newline='') as csv_file:

  input_data = csv_file.read()  

  print (input_data)

Error:

FileNotFoundError: [Errno 2] No such file or directory: 'dbfs:/workfile/file.csv'

Later I realized that the open() command is to deal with files in the Local File System. So I tried to copy the file from DBFS to the Local File System

Case03: Copy a file from DBFS to the Local File System

localFile = 'file:///tmp/wf_lfs.csv'

dbutils.fs.cp(dbfsFile, localFile)

Error:

Java.lang.SecurityException: Cannot use com.databricks.backend.daemon.driver.WorkspaceLocalFileSystem - local filesystem access is forbidden

Question 1

Local File System. I assume that it is the file system in the Spark Driver Node. Is my understanding correct?

Question 2

Is there any way to read a file directly in the DBFS? If not is it because DBFS is a distributed file system?

Question 3

How to copy a file from DBFS to the Local File System?

Thanks a lot for your help in advance!