Date schema issues with pyspark dataframe creation
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
11-03-2022 08:34 PM
I'm having some issues with creating a dataframe with a date column. Could I know what is wrong?
from pyspark.sql import SparkSession
from pyspark.sql.types import StructType
from pyspark.sql.types import DateType, FloatType
spark = SparkSession.builder.appName('DataFrame').getOrCreate()
schema = StructType() \
.add("DATE", DateType(), True) \
.add("A", FloatType(), True) \
.add("B", FloatType(), True)
df = spark.read.format("csv").option("header", True).option("dateFormat", "MM/dd/yyyy").schema(schem).load(''test.csv")
df.show()
This is the error I'm getting:
org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 158.0 failed 4 times, most recent failure: Lost task 0.3 in stage 158.0 (TID 1823) (10.237.208.145 executor 5): org.apache.spark.SparkUpgradeException: [INCONSISTENT_BEHAVIOR_CROSS_VERSION.PARSE_DATETIME_BY_NEW_PARSER] You may get a different result due to the upgrading to Spark >= 3.0:
- Labels:
-
Date Column
-
DateSchema
-
Pyspark
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
11-03-2022 10:06 PM
Hi @Kevin Kim , Could you please try upgrading the spark version? Also, please provide the full error logs.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
11-22-2022 09:22 AM
Hi @Kaniz Fatma,
I actually changed the date format to 'M/d/Y' and it didn't throw any errors. I found in my csv file that it had dates like '3/1/2022'. Could that be the issue? But some dates also were like '12/1/2022. So I'm kind of confused.

