Re: How to set the timestamp format when reading C...

DonatienTessier · ‎07-16-2019

Hi @Emiliano Parizzi,

You could parsed the timestamp after loading the file with using the withColumn (cf. https://stackoverflow.com/questions/39088473/pyspark-dataframe-convert-unusual-string-format-to-time....

from pyspark.sql import Row from pyspark.sql.functions import to_timestamp

(sc .parallelize([Row(dt='02/07/2019 14:51:32.869-08:00')]) .toDF() .withColumn("parsed", to_timestamp("dt", "MM/dd/yyyy HH:mm:ss.SSSXXX")) .show(1, False))

+-----------------------------+-------------------+ |dt |parsed | +-----------------------------+-------------------+ |02/07/2019 14:51:32.869-08:00|2019-02-07 22:51:32| +-----------------------------+-------------------+