Re: Read file from dbfs with pd.read_csv() using d...

martud · ‎01-04-2023

DataBricks community edition 10.4 LTS ML (Apache Spark 3.2.1, Scala 2.12) has the same problem with pd.read_csv.

The spark.read statement replaces the original column names with (_c0, _c1,…), unless .option("header", true") is used.

The following forms should work:

path = 'dbfs:/FileStore/tables/POS_CASH_balance.csv'

spark.read
.option("header", "true")
.csv(path)

spark.read
.format("csv")
.option("header", "true")
.load(file_name)