Hi community,
I don't know what is happening TBH.
I have a use case where data is written to the location "dbfs:/mnt/...", don't ask me why it's mounted, it's just a side project. I do believe that data is stored in ADLS2.
I've been trying to read the data after it's written bu when I try to read data from the folder:
df = spark.read.format("parquet").load("dbfs:/mnt/table/")
or
df = spark.read.format("parquet").load("dbfs:/mnt/table/date=2022-12-16")
I get: AnalysisException: Unable to infer schema for Parquet. It must be specified manually.
when I provide the schema, the count = 0 (zero):
df.count()
but when I provide full path to the parquet file it works:
df = spark.read.format("parquet").load("dbfs:/mnt/table/date=2022-12-16/some-spark-file.snappy.parquet")
df.count()
it return 700 rows.
any ideas ? 🙂