Databricks Community

b_1 · ‎06-20-2023

I have this datetime string in my dataset: '2023061218154258' and I want to convert it to datetime, using below code. However the format that I expect to work, doesn't work, namely: yyyyMMddHHmmssSS. This code will reproduce the issue:

from pyspark.sql.functions import *
spark.conf.set("spark.sql.legacy.timeParserPolicy","CORRECTED")
# If the config is set to CORRECTED then the conversion will return null instead of throwing an exception.
 
df=spark.createDataFrame(
         data=[ ("1",  "2023061218154258")
                , ("2", "20230612181542.58")]
        ,schema=["id","input_timestamp"])
df.printSchema()
 
#Timestamp String to DateType
1. df.withColumn("timestamp",to_timestamp("input_timestamp", format = 'yyyyMMddHHmmssSS')).show(truncate=False)
df.withColumn("timestamp",to_timestamp("input_timestamp", format = 'yyyyMMddHHmmss.SS')).show(truncate=False)

output:

+---+-----------------+---------+
|id |input_timestamp  |timestamp|
+---+-----------------+---------+
|1  |2023061218154258 |null     |
|2  |20230612181542.58|null     |
+---+-----------------+---------+
 
+---+-----------------+----------------------+
|id |input_timestamp  |timestamp             |
+---+-----------------+----------------------+
|1  |2023061218154258 |null                  |
|2  |20230612181542.58|2023-06-12 18:15:42.58|
+---+-----------------+----------------------+

I tried to_timestamp with the format yyyyMMddHHmmssSS and I expected that it would convert the string 2023061218154258 into the timestamp 2023-06-12 18:15:42.58

When I change the line

spark.conf.set("spark.sql.legacy.timeParserPolicy","CORRECTED")

into

spark.conf.set("spark.sql.legacy.timeParserPolicy","LEGACY") the issue is solved, but I don't want to use legacy mode (because it gives other issues).

Anonymous · ‎06-20-2023

Hi @Bas van den Berg

Great to meet you, and thanks for your question!

Let's see if your peers in the community have an answer to your question. Thanks.

b_1 · ‎10-11-2023

Is there anybody who has the same issue or knows that this is in fact an issue?

Databricks Community

to_timstamp function in non-legacy mode does not parse this format: yyyyMMddHHmmssSS

Join Us as a Local Community Builder!

Solution Accelerator Series | #5 - Automating Product Review Summarization with LLMs

The next BrickTalks about the latest and greatest in AI/BI is scheduled for Oct 28!

🚀 Weekly Delta (8 - 14 October): A Look Back at This Week’s Top Community Highlights

BrickCon 2025 — Dec 3–5 | A Community Conference for Databricks Builders

🌟 Community Sparks of the Week | September 26 – October 2 🌟