how to read the CSV file from users workspace

dev_puli
New Contributor III

Hi!

I have been carrying out a POC, so I created the CSV file in my workspace and tried to read the content using the techniques below in a Python notebook, but did not work.

Option1:

repo_file = "/Workspace/Users/u1@org.com/csv files/f1.csv"
tmp_file_name = file_name.replace(".","").replace("/","").replace(" ","")
file_location = f"/FileStore/tmp/{tmp_file_name}"
dbutils.fs.rm("/FileStore/tmp/", True)
dbutils.fs.mkdirs("/FileStore/tmp/")
shutil.copyfile(repo_file, f"/dbfs{file_location}")
Option2: 
repo_file = "/Workspace/Users/u1@org.com/csv files/f1.csv"
tmp_file_name = file_name.replace(".","").replace("/","").replace(" ","")
file_location = f"/FileStore/tmp/{tmp_file_name}"
dbutils.fs.rm("/FileStore/tmp/", True)
dbutils.fs.mkdirs("/FileStore/tmp/")
dbutils.fs.cp(repo_file, f"/dbfs{file_location}")
 
Both the options throw the same exception java.io.FileNotFoundException. I realized problem is with the source file path. Above code works fine, if I try to read the file from repos instead of my workspace.

As an alternative, I uploaded the CSV file into a blob storage account and able to read it without any issues.

I am curious to find as I believe there must be a way to read the CSV file from my work space aswell.

I would be glad if you can post here how to do so?

Thanks!

dev_puli
New Contributor III

I tried the below repo_file values in both options. However, I continue to see the same exception

repo_file = os.path.abspath("./csv files/f1.csv")
repo_file = os.path.abspath("/Workspace/Users/u1@org.com/csv files/f1.csv")
 

Hi @Dev 

you can read like below with the latest cluster 12.2 and above  (prefix file:/)

df = spark.read.option("header", "true").csv("file:/Workspace/Users/u1@org.com/csv files/f1.csv")
df.display()

This worked for me thanks, adding file:/ before Workspace

PabloCSD
Valued Contributor II

Had the same issue today, adding the "file:/" prefix worked for me too, as @Krishnamatta said:

file:/Workspace/Users/u1@org.com/csv files/f1.csv

Thanks

dev_puli
New Contributor III

@Retired_mod I copied the file path and used the same, but it didn't help. It has been working fine if I copy the file path from repos but not from the user's workspace area.

dev_puli_0-1701276808912.png

 

 

xiangzhu
Contributor III

for unity catalog enabled clusters, with the default security permmissions, I think we cannot access like this anymore.

MujtabaNoori
New Contributor III

Hi @Dev ,

Generally, What happens spark reader APIs point to the DBFS by default. And, to read the file from User workspace, we need to append 'file:/' in the prefix.

Thanks