cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Date schema issues with pyspark dataframe creation

ckwan48
New Contributor III

I'm having some issues with creating a dataframe with a date column. Could I know what is wrong?

from pyspark.sql import SparkSession
from pyspark.sql.types import StructType
from pyspark.sql.types import DateType, FloatType
 
spark = SparkSession.builder.appName('DataFrame').getOrCreate()
schema = StructType() \
      .add("DATE", DateType(), True) \
      .add("A", FloatType(), True) \
      .add("B", FloatType(), True)
 
df = spark.read.format("csv").option("header", True).option("dateFormat", "MM/dd/yyyy").schema(schem).load(''test.csv")
 
df.show()

This is the error I'm getting:

org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 158.0 failed 4 times, most recent failure: Lost task 0.3 in stage 158.0 (TID 1823) (10.237.208.145 executor 5): org.apache.spark.SparkUpgradeException: [INCONSISTENT_BEHAVIOR_CROSS_VERSION.PARSE_DATETIME_BY_NEW_PARSER] You may get a different result due to the upgrading to Spark >= 3.0:

4 REPLIES 4

Debayan
Esteemed Contributor III
Esteemed Contributor III

Hi @Kevin Kim​ , Could you please try upgrading the spark version? Also, please provide the full error logs.

Kaniz_Fatma
Community Manager
Community Manager

Hi @Kevin Kim​, We haven’t heard from you since the last response from @Debayan Mukherjee​ ​ , and I was checking back to see if his suggestions helped you.

Or else, If you have any solution, please share it with the community, as it can be helpful to others.

Also, Please don't forget to click on the "Select As Best" button whenever the information provided helps resolve your question.

ckwan48
New Contributor III

Hi @Kaniz Fatma​,

I actually changed the date format to 'M/d/Y' and it didn't throw any errors. I found in my csv file that it had dates like '3/1/2022'. Could that be the issue? But some dates also were like '12/1/2022. So I'm kind of confused.

Hi @Kevin Kim​, Could you please respond to @Debayan Mukherjee​'s response over this thread? Also, please provide full error logs.

Join 100K+ Data Experts: Register Now & Grow with Us!

Excited to expand your horizons with us? Click here to Register and begin your journey to success!

Already a member? Login and join your local regional user group! If there isn’t one near you, fill out this form and we’ll create one for you to join!