Pandas finds parquet file, Spark does not

JonW · ‎12-29-2023

I am having an issue with Databricks (Community Edition) where I can use Pandas to read a parquet file into a dataframe, but when I use Spark it states the file doesn't exist. I have tried reformatting the file path for spark but I can't seem to find a format that it will accept.

Any ideas?

Pandas:

parquet_file_path = "/dbfs/green_tripdata_2022-02.parquet"
df = pd.read_parquet(parquet_file_path, engine='pyarrow')
display(df)

Result:

Spark:

parquet_file_path = "/dbfs/green_tripdata_2022-02.parquet"
df = spark.read.parquet(parquet_file_path)
df.show()

Result:
AnalysisException: [PATH_NOT_FOUND] Path does not exist: dbfs:/dbfs/green_tripdata_2022-02.parquet.