cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

how to read excel files inside a databricks notebook?

jeremy98
Contributor III

Hi community,

Is it possible to read excel files from dbfs using a notebook file inside Databricks? If yes, how to do it?

1 ACCEPTED SOLUTION

Accepted Solutions

Stefan-Koch
Valued Contributor II

In general, you shouldn't use dbfs any more, instead, use Volumes. 

But, as example, if I have an excel in my Worskpace directory, you could do this.

StefanKoch_0-1740548788545.png

%pip install openpyxl
import pandas as pd

# replace with your path
file_path = "/Workspace/Users/stefan.koch@btelligent.com/excel/FinancialsSampleData.xlsx"

# read the sheet with Name Financials1 into a pandas dataframe
pdf = pd.read_excel(file_path, sheet_name="Financials1")

# Transform the Pandas Dataframe to a Pyspark Dataframe
df = spark.createDataFrame(pdf)

display(df)

StefanKoch_1-1740548840108.png

Would this work for you or what is your dbfs path?

 

 

View solution in original post

4 REPLIES 4

Stefan-Koch
Valued Contributor II

Hello,

Thanks for your answer, but the point is that the file location is based on dbfs and seems that using a serveless compute and executing the pandas api is not possible to look at dbfs 

Stefan-Koch
Valued Contributor II

In general, you shouldn't use dbfs any more, instead, use Volumes. 

But, as example, if I have an excel in my Worskpace directory, you could do this.

StefanKoch_0-1740548788545.png

%pip install openpyxl
import pandas as pd

# replace with your path
file_path = "/Workspace/Users/stefan.koch@btelligent.com/excel/FinancialsSampleData.xlsx"

# read the sheet with Name Financials1 into a pandas dataframe
pdf = pd.read_excel(file_path, sheet_name="Financials1")

# Transform the Pandas Dataframe to a Pyspark Dataframe
df = spark.createDataFrame(pdf)

display(df)

StefanKoch_1-1740548840108.png

Would this work for you or what is your dbfs path?

 

 

jeremy98
Contributor III

amazing, yes that's is totally what I need! Thx Stefan! 

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now