Databricks Community

irfanaziz · ‎08-10-2022

The databricks notebook failed yesterday due to timestamp format issue.

error:

"SparkUpgradeException: You may get a different result due to the upgrading of Spark 3.0: Fail to parse '2022-08-10 00:00:14.2760000' in the new parser. You can set spark.sql.legacy.timeParserPolicy to LEGACY to restore the behavior before Spark 3.0, or set to CORRECTED and treat it as an invalid datetime string.

"

The notebook has been running fine before. e.g. we have "2022-08-07T23:59:57.9740000" kind of timestamp values in the ts column.

We are using explicit timestampformat 'yyyy-MM-dd HH:mm:ss.SSS' when rreading the csv files.

However, we started getting null in the timestamp values as the values could not be converted.

So i changed the format to 'yyyy-MM-dd HH:mm:ss.SSSSSSS' and it worked for one of the objects. But the issue remains for another object.

However,

When i completely removed the timestampFormat option it worked for this last object.

I am wondering what changed on the databricks cluster that it started failing. The timestamp values in the files are in the same format as before.

Here is the function without the timestampFormat option that works.

def ReadRawCSV(filesToProcess,header,delimiter,schema_struct):
  delta_df = spark.read.options(header=header,delimiter=delimiter).schema(schema_struct).csv(filesToProcess)
  return delta_df

searchs · ‎03-28-2023

You must have solved this issue by now but for the sake of those that encounter this again, here's the solution that worked for me:

spark.sql("set spark.sql.legacy.timeParserPolicy=LEGACY")

Databricks Community

TimestampFormat issue

Photos

Join Us as a Local Community Builder!

Announcing the APJ Databricks Smart Business Insights Challenge: Empowering Data-Driven Decision Mak

🚀 Monthly Databricks Get Started Days – Accelerate Your Learning Journey! 🚀

Business Intelligence in the Era of AI

Virtual Learning Festival: 9 April - 30 April

Data + AI Summit 2025 — registration now open!