cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

how to read the CSV file from users workspace

dev_puli
New Contributor III

Hi!

I have been carrying out a POC, so I created the CSV file in my workspace and tried to read the content using the techniques below in a Python notebook, but did not work.

Option1:

repo_file = "/Workspace/Users/u1@org.com/csv files/f1.csv"
tmp_file_name = file_name.replace(".","").replace("/","").replace(" ","")
file_location = f"/FileStore/tmp/{tmp_file_name}"
dbutils.fs.rm("/FileStore/tmp/", True)
dbutils.fs.mkdirs("/FileStore/tmp/")
shutil.copyfile(repo_file, f"/dbfs{file_location}")
Option2: 
repo_file = "/Workspace/Users/u1@org.com/csv files/f1.csv"
tmp_file_name = file_name.replace(".","").replace("/","").replace(" ","")
file_location = f"/FileStore/tmp/{tmp_file_name}"
dbutils.fs.rm("/FileStore/tmp/", True)
dbutils.fs.mkdirs("/FileStore/tmp/")
dbutils.fs.cp(repo_file, f"/dbfs{file_location}")
 
Both the options throw the same exception java.io.FileNotFoundException. I realized problem is with the source file path. Above code works fine, if I try to read the file from repos instead of my workspace.

As an alternative, I uploaded the CSV file into a blob storage account and able to read it without any issues.

I am curious to find as I believe there must be a way to read the CSV file from my work space aswell.

I would be glad if you can post here how to do so?

Thanks!

4 REPLIES 4

dev_puli
New Contributor III

I tried the below repo_file values in both options. However, I continue to see the same exception

repo_file = os.path.abspath("./csv files/f1.csv")
repo_file = os.path.abspath("/Workspace/Users/u1@org.com/csv files/f1.csv")
 

Hi @Dev 

you can read like below with the latest cluster 12.2 and above  (prefix file:/)

df = spark.read.option("header", "true").csv("file:/Workspace/Users/u1@org.com/csv files/f1.csv")
df.display()

This worked for me thanks, adding file:/ before Workspace

dev_puli
New Contributor III

@Retired_mod I copied the file path and used the same, but it didn't help. It has been working fine if I copy the file path from repos but not from the user's workspace area.

dev_puli_0-1701276808912.png

 

 

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group