- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
10-26-2021 01:40 AM
i am trying to read csv file using databricks, i am getting error like ......FileNotFoundError: [Errno 2] No such file or directory: '/dbfs/FileStore/tables/world_bank.csv'
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
05-24-2022 11:33 PM
I see you use pandas to read from dbfs.
But pandas will only read from local files,
see this topic also. It is about databricks-connect but the same principles apply.
So what you should do is first read the file using spark.read.csv and then converting the spark df to a pandas df.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
08-07-2024 07:02 AM
I had the same issue: geopandas in Databricks notebooks does not open shapefiles from an Azure Storage mount.
I managed to copy the shapefile to the Databricks workspace using
dbutils.fs.cp(shapefile_path, f"file:{local_shapefile_copy_dest_path}")
The 'file:' prefix proved to be crucial here.
and then:
gdf = gpd.read_file(shapefile_path.replace('dbfs:', ''))
gdf.display()
I copy the results back to the dbfs mount using
dbutils.fs.cp(f"file:{geoparquet_path}", f"{raw_path}{geoparquet_file_basename}.parquet")
@-werners- wrote:I see you use pandas to read from dbfs.
But pandas will only read from local files,
see this topic also. It is about databricks-connect but the same principles apply.
So what you should do is first read the file using spark.read.csv and then converting the spark df to a pandas df.
@-werners- wrote:I see you use pandas to read from dbfs.
But pandas will only read from local files,
see this topic also. It is about databricks-connect but the same principles apply.
So what you should do is first read the file using spark.read.csv and then converting the spark df to a pandas df.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
08-08-2024 06:30 AM
Ik convert to parquet using
gdf.to_parquet(f"/dbfs{raw_path}/{file_name}.parquet")
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
05-24-2022 11:52 PM
Hi
you can try:
my_df = spark.read.format("csv")
.option("inferSchema","true") # to get the types from your data
.option("sep",",") # if your file is using "," as separator
.option("header","true") # if your file have the header in the first row
.load("/FileStore/tables/CREDIT_1.CSV")
display(my_df)
from above you can see that my_df is a spark dataframe and from there you can start with you code.


- « Previous
-
- 1
- 2
- Next »