10-26-2021 01:40 AM
10-26-2021 03:57 AM
I see that you are using databricks-course-cluster which have probably some limited functionality. Not sure where dbfs is mounted there. When you are using dbutils it display path for dbfs mount (dbfs file system).
Please use spark code instead of pandas so it will be executed properly:
df = spark.read.csv('dbfs:/FileStore/tables/world_bank.csv')
display(df)
10-27-2021 07:17 AM
05-24-2022 09:13 AM
05-24-2022 11:33 PM
I see you use pandas to read from dbfs.
But pandas will only read from local files,
see this topic also. It is about databricks-connect but the same principles apply.
So what you should do is first read the file using spark.read.csv and then converting the spark df to a pandas df.
08-07-2024 07:02 AM
I had the same issue: geopandas in Databricks notebooks does not open shapefiles from an Azure Storage mount.
I managed to copy the shapefile to the Databricks workspace using
dbutils.fs.cp(shapefile_path, f"file:{local_shapefile_copy_dest_path}")
The 'file:' prefix proved to be crucial here.
and then:
gdf = gpd.read_file(shapefile_path.replace('dbfs:', ''))
gdf.display()
I copy the results back to the dbfs mount using
dbutils.fs.cp(f"file:{geoparquet_path}", f"{raw_path}{geoparquet_file_basename}.parquet")
@-werners- wrote:I see you use pandas to read from dbfs.
But pandas will only read from local files,
see this topic also. It is about databricks-connect but the same principles apply.
So what you should do is first read the file using spark.read.csv and then converting the spark df to a pandas df.
@-werners- wrote:I see you use pandas to read from dbfs.
But pandas will only read from local files,
see this topic also. It is about databricks-connect but the same principles apply.
So what you should do is first read the file using spark.read.csv and then converting the spark df to a pandas df.
08-08-2024 06:30 AM
Ik convert to parquet using
gdf.to_parquet(f"/dbfs{raw_path}/{file_name}.parquet")
05-24-2022 11:52 PM
Hi
you can try:
my_df = spark.read.format("csv")
.option("inferSchema","true") # to get the types from your data
.option("sep",",") # if your file is using "," as separator
.option("header","true") # if your file have the header in the first row
.load("/FileStore/tables/CREDIT_1.CSV")
display(my_df)
from above you can see that my_df is a spark dataframe and from there you can start with you code.
Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.
If there isn’t a group near you, start one and help create a community that brings people together.
Request a New Group