cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

When formatting dates using the yyyyMMddHHmmssSSS pattern, an error occurred

liu
New Contributor III

An error occurred while converting a timestamp in the yyyyMMddHHmmssSSS format

from pyspark.sql.functions import to_timestamp_ntz, col, lit

df = spark.createDataFrame(
    [("20250730090833000")], ["datetime"])

df2 = df.withColumn("dateformat", to_timestamp_ntz(col("datetime"),lit("yyyyMMddHHmmssSSS")))

df2.display()
 

 error

Text '20250730090833000' could not be parsed at index 0. Use `try_to_timestamp` to tolerate invalid input string and return NULL instead. SQLSTATE: 22007

 

4 REPLIES 4

TheOC
Contributor

hi @liu 
I suspect supplying the format as a column is the issue. Could you please try:

df2 = df.withColumn("dateformat", to_timestamp_ntz(col("datetime"),"yyyyMMddHHmmssSSS"))

 

Cheers,
TheOC

liu
New Contributor III

hi  @TheOC 

Thank you for your response.

If I don’t use lit, I get an error saying this column cannot be found.

szymon_dybczak
Esteemed Contributor III

Hi @liu ,

I think it could be related to following bug in Java. I suspect that internally to_timestamp_ntz uses DateTimeFormatter.

[JDK-8031085] DateTimeFormatter won't parse dates with custom format "yyyyMMddHHmmssSSS" - Java Bug ...

Now what's interesting, if the format has a decimal point before the miliseconds SSS, it can be parsed normally (
such as the format yyyyMMddHHmmss.SSS and enter 20240627235959.999).

So one workaround you can try :

from pyspark.sql.functions import to_timestamp_ntz, col, lit

df = spark.createDataFrame(
    [("20250730090833000")], ["datetime"])

df2 =  df.select(
    "datetime",    
    to_timestamp(
        concat(
            substring("datetime", 1, 14),
            lit('.'),
            substring("datetime", 15, 3)
        ),
        'yyyyMMddHHmmss.SSS'
    ).alias('ts')
)

df2.display()

#20250730090833000

 

liu
New Contributor III

hi @szymon_dybczak 

Thank you for your respons.

I did apply some substring  concatenation logic to make the conversion work, but the most straightforward way is still using the yyyyMMddHHmmssSSS format.
I checked the link you shared  this appears to be a bug dating back to 2013, yet it's still impacting us in 2025. That’s really unfortunate.

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now