I'm having some issues with creating a dataframe with a date column. Could I know what is wrong?
from pyspark.sql import SparkSession
from pyspark.sql.types import StructType
from pyspark.sql.types import DateType, FloatType
spark = SparkSession.builder.appName('DataFrame').getOrCreate()
schema = StructType() \
.add("DATE", DateType(), True) \
.add("A", FloatType(), True) \
.add("B", FloatType(), True)
df = spark.read.format("csv").option("header", True).option("dateFormat", "MM/dd/yyyy").schema(schem).load(''test.csv")
df.show()
This is the error I'm getting:
org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 158.0 failed 4 times, most recent failure: Lost task 0.3 in stage 158.0 (TID 1823) (10.237.208.145 executor 5): org.apache.spark.SparkUpgradeException: [INCONSISTENT_BEHAVIOR_CROSS_VERSION.PARSE_DATETIME_BY_NEW_PARSER] You may get a different result due to the upgrading to Spark >= 3.0: