Re: When formatting dates using the yyyyMMddHHmmss...

szymon_dybczak · ‎07-31-2025

I think it could be related to following bug in Java. I suspect that internally to_timestamp_ntz uses DateTimeFormatter.

[JDK-8031085] DateTimeFormatter won't parse dates with custom format "yyyyMMddHHmmssSSS" - Java Bug ...

Now what's interesting, if the format has a decimal point before the miliseconds SSS, it can be parsed normally (
such as the format yyyyMMddHHmmss.SSS and enter 20240627235959.999).

So one workaround you can try :

from pyspark.sql.functions import to_timestamp_ntz, col, lit

df = spark.createDataFrame(
    [("20250730090833000")], ["datetime"])

df2 =  df.select(
    "datetime",    
    to_timestamp(
        concat(
            substring("datetime", 1, 14),
            lit('.'),
            substring("datetime", 15, 3)
        ),
        'yyyyMMddHHmmss.SSS'
    ).alias('ts')
)

df2.display()

#20250730090833000