-werners-
Esteemed Contributor III

I see you use pandas to read from dbfs.

But pandas will only read from local files,

see this topic also. It is about databricks-connect but the same principles apply.

So what you should do is first read the file using spark.read.csv and then converting the spark df to a pandas df.

Eagle78
New Contributor III

I had the same issue: geopandas in Databricks notebooks does not open shapefiles from an Azure Storage mount.
I managed to copy the shapefile to the Databricks workspace using 

 

dbutils.fs.cp(shapefile_path, f"file:{local_shapefile_copy_dest_path}") 

The 'file:' prefix proved to be crucial here.

and then: 

gdf = gpd.read_file(shapefile_path.replace('dbfs:', ''))
gdf.display()

I copy the results back to the dbfs mount using

dbutils.fs.cp(f"file:{geoparquet_path}", f"{raw_path}{geoparquet_file_basename}.parquet")

 


@-werners- wrote:

I see you use pandas to read from dbfs.

But pandas will only read from local files,

see this topic also. It is about databricks-connect but the same principles apply.

So what you should do is first read the file using spark.read.csv and then converting the spark df to a pandas df.



@-werners- wrote:

I see you use pandas to read from dbfs.

But pandas will only read from local files,

see this topic also. It is about databricks-connect but the same principles apply.

So what you should do is first read the file using spark.read.csv and then converting the spark df to a pandas df.






Eagle78
New Contributor III

Ik convert to parquet using

gdf.to_parquet(f"/dbfs{raw_path}/{file_name}.parquet") 

 

Alexis
New Contributor III

Hi

you can try:

my_df = spark.read.format("csv")

      .option("inferSchema","true")  # to get the types from your data

      .option("sep",",")            # if your file is using "," as separator

      .option("header","true")       # if your file have the header in the first row

      .load("/FileStore/tables/CREDIT_1.CSV")

display(my_df)

from above you can see that my_df is a spark dataframe and from there you can start with you code.

View solution in original post