cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
cancel
Showing results for 
Search instead for 
Did you mean: 

how to read the CSV file from users workspace

dev_puli
New Contributor III

Hi!

I have been carrying out a POC, so I created the CSV file in my workspace and tried to read the content using the techniques below in a Python notebook, but did not work.

Option1:

repo_file = "/Workspace/Users/u1@org.com/csv files/f1.csv"
tmp_file_name = file_name.replace(".","").replace("/","").replace(" ","")
file_location = f"/FileStore/tmp/{tmp_file_name}"
dbutils.fs.rm("/FileStore/tmp/", True)
dbutils.fs.mkdirs("/FileStore/tmp/")
shutil.copyfile(repo_file, f"/dbfs{file_location}")
Option2: 
repo_file = "/Workspace/Users/u1@org.com/csv files/f1.csv"
tmp_file_name = file_name.replace(".","").replace("/","").replace(" ","")
file_location = f"/FileStore/tmp/{tmp_file_name}"
dbutils.fs.rm("/FileStore/tmp/", True)
dbutils.fs.mkdirs("/FileStore/tmp/")
dbutils.fs.cp(repo_file, f"/dbfs{file_location}")
 
Both the options throw the same exception java.io.FileNotFoundException. I realized problem is with the source file path. Above code works fine, if I try to read the file from repos instead of my workspace.

As an alternative, I uploaded the CSV file into a blob storage account and able to read it without any issues.

I am curious to find as I believe there must be a way to read the CSV file from my work space aswell.

I would be glad if you can post here how to do so?

Thanks!

4 REPLIES 4

dev_puli
New Contributor III

I tried the below repo_file values in both options. However, I continue to see the same exception

repo_file = os.path.abspath("./csv files/f1.csv")
repo_file = os.path.abspath("/Workspace/Users/u1@org.com/csv files/f1.csv")
 

Hi @Dev 

you can read like below with the latest cluster 12.2 and above  (prefix file:/)

df = spark.read.option("header", "true").csv("file:/Workspace/Users/u1@org.com/csv files/f1.csv")
df.display()

Kaniz
Community Manager
Community Manager

Hi @dev_puli, Certainly! Let’s explore how you can read a CSV file from your workspace in Databricks.

When reading a CSV file in Databricks, you need to ensure that the file path is correctly specified. 

Here are some steps and examples to help you achieve this:

Relative Path:

  • If your CSV file is located within your workspace, you can use a relative path to access it.
  • The relative path starts from the current working directory (where your notebook is located).
  • For example, if your CSV file is in a folder named data_folder, you can read it like this:
  • import pandas as pd 
    df = pd.read_csv('./data_folder/data.csv')

Absolute Path:

  • You can also use an absolute path to specify the location of the CSV file directly.
  • Make sure to provide the full path to the file.
  • For example:
  • import pandas as pd file_location = "/Workspace/Users/u1@org.com/CSV files/f1.csv" 
    df = pd.read_csv(file_location)

Databricks File System (DBFS):

  • Databricks provides a distributed file system called DBFS.
  • You can use the dbfs prefix to read files from DBFS.
  • For example:
  • import pandas as pd 
    dbfs_file_location = "/dbfs/Workspace/Users/u1@org.com/csv files/f1.csv" 
    df = pd.read_csv(dbfs_file_location)

If you encounter any issues, ensure the file exists, and the path is correctly specified. Happy data exploration! 📊🐍

dev_puli
New Contributor III

@Kaniz I copied the file path and used the same, but it didn't help. It has been working fine if I copy the file path from repos but not from the user's workspace area.

dev_puli_0-1701276808912.png

 

 

Welcome to Databricks Community: Lets learn, network and celebrate together

Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections. 

Click here to register and join today! 

Engage in exciting technical discussions, join a group with your peers and meet our Featured Members.