Problems with pandas.read_parquet() and path
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
11-30-2022 11:20 AM
I am doing the "Data Engineering with Databricks V2" learning path.
I cannot run "DE 4.2 - Providing Options for External Sources", as the first code cell does not run successful:
%run ../Includes/Classroom-Setup-04.2
Screenshot 1:
Inside the setup notebook, the code crashes at the following command (see screenshot 2):
df = pd.read_parquet(path = datasource_path.replace("dbfs:/", '/dbfs/'))
The error message is:
FileNotFoundError: [Errno 2] No such file or directory: '/dbfs/mnt/dbacademy-datasets/data-engineering-with-databricks/v02/ecommerce/raw/users-historical'
Screenshot 2:
There seems to be an issue with the path, even though it actually exists:
Screenshot 3:
I played around a little with the path specification, but nothing helped:
Screenshot 4:
- Labels:
-
Databricks V2
-
Parquet
-
Path
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
10-16-2024 08:34 AM - edited 10-16-2024 08:35 AM
spark solution worked
instead of
spark_df = spark.read.parquet("dbfs:/mnt/dbacademy-datasets/data-engineer-learning-path/v04/ecommerce/raw/users-historical/")
# Convert to pandas
DataFrame df = spark_df.toPandas()
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
10-24-2024 11:45 PM
Thanks for sharing bro ..It really helped.
- « Previous
-
- 1
- 2
- Next »