Pandas finds parquet file, Spark does not
Options
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
12-29-2023 12:02 PM
I am having an issue with Databricks (Community Edition) where I can use Pandas to read a parquet file into a dataframe, but when I use Spark it states the file doesn't exist. I have tried reformatting the file path for spark but I can't seem to find a format that it will accept.
Any ideas?
Pandas:
parquet_file_path = "/dbfs/green_tripdata_2022-02.parquet"
df = pd.read_parquet(parquet_file_path, engine='pyarrow')
display(df)
Result:
Spark:
parquet_file_path = "/dbfs/green_tripdata_2022-02.parquet"
df = spark.read.parquet(parquet_file_path)
df.show()
Result:
AnalysisException: [PATH_NOT_FOUND] Path does not exist: dbfs:/dbfs/green_tripdata_2022-02.parquet.
AnalysisException: [PATH_NOT_FOUND] Path does not exist: dbfs:/dbfs/green_tripdata_2022-02.parquet.