Hi @Frantz! Setting the header option to true allows you to easily avoid the preset default column names (_c0, _c1, etc.) when using PySpark to load CSV data into an external table. This allows you to use the first row of your CSV file as the column names, which can be done as follows:
# Assuming you have a CSV file named "file.csv"
dff = spark.read.format("csv") \
.option("delimiter", ",") \
.option("header", "true") \
.option("inferSchema", "true") \
.load("file.csv")
This code snippet uses the header=True setting to ensure that the first row of the CSV file is recognized as column names. Additionally, the inferSchema=True setting allows for automatic inference of column data types. As a result, the DataFrame dff will contain the same column names as the original CSV file rather than the generic _c0, _c1, etc.