Options
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
01-04-2023 01:04 PM
DataBricks community edition 10.4 LTS ML (Apache Spark 3.2.1, Scala 2.12) has the same problem with pd.read_csv.
The spark.read statement replaces the original column names with (_c0, _c1,…), unless .option("header", true") is used.
The following forms should work:
path = 'dbfs:/FileStore/tables/POS_CASH_balance.csv'spark.read
.option("header", "true")
.csv(path)spark.read
.format("csv")
.option("header", "true")
.load(file_name)