When formatting dates using the yyyyMMddHHmmssSSS pattern, an error occurred

liu
Databricks Partner

An error occurred while converting a timestamp in the yyyyMMddHHmmssSSS format

from pyspark.sql.functions import to_timestamp_ntz, col, lit

df = spark.createDataFrame(
    [("20250730090833000")], ["datetime"])

df2 = df.withColumn("dateformat", to_timestamp_ntz(col("datetime"),lit("yyyyMMddHHmmssSSS")))

df2.display()
 

 error

Text '20250730090833000' could not be parsed at index 0. Use `try_to_timestamp` to tolerate invalid input string and return NULL instead. SQLSTATE: 22007

 

TheOC
Databricks Partner

hi @liu 
I suspect supplying the format as a column is the issue. Could you please try:

df2 = df.withColumn("dateformat", to_timestamp_ntz(col("datetime"),"yyyyMMddHHmmssSSS"))

 

Cheers,
TheOC

liu
Databricks Partner

hi  @TheOC 

Thank you for your response.

If I don’t use lit, I get an error saying this column cannot be found.

szymon_dybczak
Esteemed Contributor III

Hi @liu ,

I think it could be related to following bug in Java. I suspect that internally to_timestamp_ntz uses DateTimeFormatter.

[JDK-8031085] DateTimeFormatter won't parse dates with custom format "yyyyMMddHHmmssSSS" - Java Bug ...

Now what's interesting, if the format has a decimal point before the miliseconds SSS, it can be parsed normally (
such as the format yyyyMMddHHmmss.SSS and enter 20240627235959.999).

So one workaround you can try :

from pyspark.sql.functions import to_timestamp_ntz, col, lit

df = spark.createDataFrame(
    [("20250730090833000")], ["datetime"])

df2 =  df.select(
    "datetime",    
    to_timestamp(
        concat(
            substring("datetime", 1, 14),
            lit('.'),
            substring("datetime", 15, 3)
        ),
        'yyyyMMddHHmmss.SSS'
    ).alias('ts')
)

df2.display()

#20250730090833000

 

liu
Databricks Partner

hi @szymon_dybczak 

Thank you for your respons.

I did apply some substring  concatenation logic to make the conversion work, but the most straightforward way is still using the yyyyMMddHHmmssSSS format.
I checked the link you shared  this appears to be a bug dating back to 2013, yet it's still impacting us in 2025. That’s really unfortunate.