cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

How to Pass Data to a Databricks App?

adam_mich
New Contributor II

I am developing a Databricks application using the Streamlit package. I was able to get a "hello world" app deployed successfully, but now I am trying to pass data that exists in the dbfs on the same instance. I try to read a csv saved to the dbfs but get a file not found error. I am assuming there is a virtual environment being setup during deployment and there is an additional step I need to do to configure the path. Thanks in advance.

10 REPLIES 10

Walter_C
Databricks Employee
Databricks Employee

Hello Adam,

So you are running something similar to:

 import streamlit as st
 import pandas as pd

 # Path to the CSV file in DBFS
 file_path = '/dbfs/path/to/your/file.csv'

 # Read the CSV file
 df = pd.read_csv(file_path)

 # Display the dataframe in Streamlit
 st.write(df)

And it is resulting in this file not found issue?

adam_mich
New Contributor II

Yes, exactly. To add more context, the read_csv() line works if I just run it in a notebook with the same path, but it does not work once I try and deploy the application.

Hello Walter,

did you have the possibility to look into this?

Walter_C
Databricks Employee
Databricks Employee

What if you try to list the file using dbutils.fs.ls("dbfs:/mnt/path/to/data") does it list it?

adam_mich
New Contributor II

Well, I can't even use dbutils in the app. When I try that, I get a NameError name 'dbutils' is not defined error. Again, this just works in a notebook but not the app.

 

If I try and do this: 

os.listdir('/dbfs/'), it again does not find that directory.

Have you found a solution? As far as I can see, the apps run in an environment where DBFS is not mounted. 

The environment where the app runs does not have the following directories in the root folder:

['/media', '/libx32', '/var', '/sbin', '/bin', '/srv', '/proc', '/opt', '/home', '/etc', '/lib', '/usr', '/boot', '/tmp', '/root', '/sys', '/run', '/dev', '/lib64', '/lib32', '/mnt', '/app', '/databricks']
 
And `/mnt` is empty. 

saurabh18cs
Valued Contributor II

 Ensure that the environment where Streamlit is running has access to the DBFS paths. This is typically handled by Databricks, but if you are running Streamlit outside of Databricks, you may need to configure access to DBFS.

 If you are running Streamlit outside of Databricks, consider using Databricks Connect to interact with Databricks resources

That's the point. I am running streamlit from a Databricks App, so I was wondering if they can propose the "right" way to access DBFS. Or if they think that the way to exchange data should be using an SQL Warehouse. 

txti
New Contributor III

I have the identical problem in Databricks Apps.  I have tried...

  • Reading from DBFS path using mount version `/dbfs/myfolder/myfile` and protocol `dbfs:/myfolder/myfile`
  • Reading from Unity Volumes `/Volumes/mycatalog/mydatabase/myfolder/myfile`
    • Also made sure that the App principal had rights to read from the specific Unity Volume
  • Reading from S3 path `s3://mybucket/mypath/myfile`

None of these methods worked for me and cannot use Apps until I have a solution for this.

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you wonโ€™t want to miss the chance to attend and share knowledge.

If there isnโ€™t a group near you, start one and help create a community that brings people together.

Request a New Group