I had the same issue: geopandas in Databricks notebooks does not open shapefiles from an Azure Storage mount.
I managed to copy the shapefile to the Databricks workspace using
dbutils.fs.cp(shapefile_path, f"file:{local_shapefile_copy_dest_path}")
The 'file:' prefix proved to be crucial here.
and then:
gdf = gpd.read_file(shapefile_path.replace('dbfs:', ''))
gdf.display()
I copy the results back to the dbfs mount using
dbutils.fs.cp(f"file:{geoparquet_path}", f"{raw_path}{geoparquet_file_basename}.parquet")
@-werners- wrote:
I see you use pandas to read from dbfs.
But pandas will only read from local files,
see this topic also. It is about databricks-connect but the same principles apply.
So what you should do is first read the file using spark.read.csv and then converting the spark df to a pandas df.
@-werners- wrote:
I see you use pandas to read from dbfs.
But pandas will only read from local files,
see this topic also. It is about databricks-connect but the same principles apply.
So what you should do is first read the file using spark.read.csv and then converting the spark df to a pandas df.