How to read csv files stored in my Databricks workspace using a Python script in my local computer?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
08-15-2024 06:05 PM - edited 08-15-2024 06:07 PM
I am developing a Python app on my local computer, and I would like to let it read some data stored in my Databricks workspace using preferably Pandas. The data are stored in .csv files in the workspace. How can I make this happen? Is it possible to achieve via file URL? A code snippet would be appreciated! Thanks!
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
08-16-2024 12:25 AM
Hi alexkychen, assuming you have the file saved in DBFS in your Databricks workspace, you can read the file by getting the file's contents in DBFS via the Databricks API -> https://docs.databricks.com/api/workspace/dbfs/read
Here is a simple Python snippet that allows you to achieve this locally. This snippet uses a Personal access token, and prints the base64 encoded content of the file.
import requests
import json
DATABRICKS_HOST = 'https://<FILL_IN_DATABRICKS_HOST>'
DATABRICKS_TOKEN = '<FILL_IN_TOKEN>'
reqUrl = f"{DATABRICKS_HOST}/api/2.0/dbfs/read"
headersList = {
f"Authorization": "Bearer {DATABRICKS_TOKEN}",
"Content-Type": "application/json"
}
payload = json.dumps({
"path":"/dbfs/tmp/example_folder/test.csv"
})
response = requests.request("GET", reqUrl, data=payload, headers=headersList)
# Print the content, which is Base64 encoded
print(response.text)
Hope this helps 🙂
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
08-16-2024 11:36 AM - edited 08-16-2024 11:39 AM
Hi Eni,
Thank you very much for your reply. I also did some research, but realized that storing sensitive data (which is in my case) in DBFS is no longer recommended by Databricks due to security reason as it states here: https://docs.databricks.com/en/files/index.html#work-with-files-in-dbfs-mounts-and-dbfs-root. I will look for other solutions to better store the data on Databricks and can be accessed locally and securely.
Anyway, your reply is much appreciated!

