cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

How to Pass Data to a Databricks App?

adam_mich
New Contributor II

I am developing a Databricks application using the Streamlit package. I was able to get a "hello world" app deployed successfully, but now I am trying to pass data that exists in the dbfs on the same instance. I try to read a csv saved to the dbfs but get a file not found error. I am assuming there is a virtual environment being setup during deployment and there is an additional step I need to do to configure the path. Thanks in advance.

10 REPLIES 10

Walter_C
Databricks Employee
Databricks Employee

Hello Adam,

So you are running something similar to:

 import streamlit as st
 import pandas as pd

 # Path to the CSV file in DBFS
 file_path = '/dbfs/path/to/your/file.csv'

 # Read the CSV file
 df = pd.read_csv(file_path)

 # Display the dataframe in Streamlit
 st.write(df)

And it is resulting in this file not found issue?

adam_mich
New Contributor II

Yes, exactly. To add more context, the read_csv() line works if I just run it in a notebook with the same path, but it does not work once I try and deploy the application.

Hello Walter,

did you have the possibility to look into this?

Walter_C
Databricks Employee
Databricks Employee

What if you try to list the file using dbutils.fs.ls("dbfs:/mnt/path/to/data") does it list it?

adam_mich
New Contributor II

Well, I can't even use dbutils in the app. When I try that, I get a NameError name 'dbutils' is not defined error. Again, this just works in a notebook but not the app.

 

If I try and do this: 

os.listdir('/dbfs/'), it again does not find that directory.

Have you found a solution? As far as I can see, the apps run in an environment where DBFS is not mounted. 

The environment where the app runs does not have the following directories in the root folder:

['/media', '/libx32', '/var', '/sbin', '/bin', '/srv', '/proc', '/opt', '/home', '/etc', '/lib', '/usr', '/boot', '/tmp', '/root', '/sys', '/run', '/dev', '/lib64', '/lib32', '/mnt', '/app', '/databricks']
 
And `/mnt` is empty. 

saurabh18cs
Honored Contributor

 Ensure that the environment where Streamlit is running has access to the DBFS paths. This is typically handled by Databricks, but if you are running Streamlit outside of Databricks, you may need to configure access to DBFS.

 If you are running Streamlit outside of Databricks, consider using Databricks Connect to interact with Databricks resources

That's the point. I am running streamlit from a Databricks App, so I was wondering if they can propose the "right" way to access DBFS. Or if they think that the way to exchange data should be using an SQL Warehouse. 

txti
New Contributor III

I have the identical problem in Databricks Apps.  I have tried...

  • Reading from DBFS path using mount version `/dbfs/myfolder/myfile` and protocol `dbfs:/myfolder/myfile`
  • Reading from Unity Volumes `/Volumes/mycatalog/mydatabase/myfolder/myfile`
    • Also made sure that the App principal had rights to read from the specific Unity Volume
  • Reading from S3 path `s3://mybucket/mypath/myfile`

None of these methods worked for me and cannot use Apps until I have a solution for this.

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now