Re: Read file from dbfs with pd.read_csv() using d...

Anonymous · ‎11-25-2021

Hi,

After some research, I have found out that the pandas API reads only local files. This means that even if a read_csv command works in the Databricks Notebook environment, it will not work when using databricks-connect (pandas reads locally from within the notebook environment).

A work around is to use the pyspark spark.read.format('csv') API to read the remote files and append a ".toPandas()" at the end so that we get a pandas dataframe.

df_pandas = spark.read.format('csv').options(header='true').load('path/in/the/remote/dbfs/filesystem/').toPandas()

View solution in original post