Pyspark datatype missing microsecond precision las...

jimbo · ‎11-18-2023

Hi all,

We are having issues with the datetype data type in spark when ingesting files.

Effectively the source data has 6 microseconds worth of precision but the most we can extract from the datatype is three. For example 12:03:23.123, but what is required is 12:03:23.123456. The source file has this precision but when the file is ingested. Here is an example:

df.select(to_timestamp("date_col", "yyyy-MM-dd").alias("date"), to_timestamp("timestamp_col", "yyyy-MM-dd HH:mm:ss.SSS").alias("timestamp")).show(truncate=False

|2022-03-16 12:34:56.789|
|2022-03-16 01:23:45.678|

the requirement is for |2022-03-16 12:34:56.456789.

What is the best way to do this?

Many thanks

Jay

Pyspark datatype missing microsecond precision last three SSS: h:mm:ss:SSSSSS - datetype