cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
cancel
Showing results for 
Search instead for 
Did you mean: 

How to set the timestamp format when reading CSV

EmilianoParizz1
New Contributor

I have a Databricks 5.3 cluster on Azure which runs Apache Spark 2.4.0 and Scala 2.11.

I'm trying to parse a CSV file with a custom timestamp format but I don't know which datetime pattern format Spark uses.

My CSV looks like this:
Timestamp, Name, Value  
02/07/2019 14:51:32.869-08:00, BatteryA, 0.25  
02/07/2019 14:55:45.343-08:00, BatteryB, 0.50  
02/07/2019 14:58:25.845-08:00, BatteryC, 0.34

I'm executing the following to read it:val csvDataFrame = sqlContext.read.format("csv") .option("header", "true") .option("treatEmptyValuesAsNulls", "true") .option("inferSchema", "true") .option("mode","DROPMALFORMED") .option("timestampFormat", "MM/dd/yyyy HH:mm:ss.SSSZZ") .load("path/to/file.csv")

csvDataFrame.printSchema()

But no matter what timestamp pattern I use, the first column is always inferred as string.
csvDataFrame:org.apache.spark.sql.DataFrame
  Timestamp:string
  Name:string
  Value:double

I'm not a Java/Scala developer and I'm new to Spark/Databricks. I can't find anywhere which datetime formatter does Spark use to parse the values.

4 REPLIES 4

mekkinen
New Contributor II

At least based on the Pyspark documentation: (https://spark.apache.org/docs/latest/api/python/pyspark.sql.html#pyspark.sql.DataFrameReader) it it stated that:

"

  • dateFormat – sets the string that indicates a date format. Custom date formats follow the formats at
    java.text.SimpleDateFormat
    . This applies to date type. If None is set, it uses the default value,
    yyyy-MM-dd
    .
  • timestampFormat – sets the string that indicates a timestamp format. Custom date formats follow the formats at
    java.text.SimpleDateFormat
    . This applies to timestamp type. If None is set, it uses the default value,
    yyyy-MM-dd'T'HH:mm:ss.SSSXXX
    .

"

I would imagine that these were the same in the case of writing scala.

DonatienTessier
New Contributor III

Hi @Emiliano Parizzi,

You could parsed the timestamp after loading the file with using the withColumn (cf. https://stackoverflow.com/questions/39088473/pyspark-dataframe-convert-unusual-string-format-to-time....

from pyspark.sql import Row from pyspark.sql.functions import to_timestamp

(sc .parallelize([Row(dt='02/07/2019 14:51:32.869-08:00')]) .toDF() .withColumn("parsed", to_timestamp("dt", "MM/dd/yyyy HH:mm:ss.SSSXXX")) .show(1, False))

+-----------------------------+-------------------+ |dt |parsed | +-----------------------------+-------------------+ |02/07/2019 14:51:32.869-08:00|2019-02-07 22:51:32| +-----------------------------+-------------------+

SteveDocherty
New Contributor II

# in python: explicitly define the schema, read in CSV data using the schema and a defined timestamp format [and an extra column to be used for partitioning; this part is optional] csvSchema = StructType([ StructField("Timestamp",TimestampType(),True), StructField("Name",StringType(),True), StructField("Value",DoubleType(),True) ])

df = spark.read \ .csv(file_path, header = True, multiLine = True, escape = "\"", schema = csvSchema, timestampFormat = "MM/dd/yyyy HH:mm:ss.SSSZZ" ) \ .withColumn("year", date_format(col("Timestamp"), "yyyy").cast(IntegerType())) \ .withColumn("month", date_format(col("Timestamp"), "MM").cast(IntegerType()))

display(df)

wellington72019
New Contributor II

# in python: explicitly define the schema, read in CSV data using the schema and a defined timestamp format....

<a href="http://thestoreguide.co.nz/auckland/orewa/mcdonalds-orewa-akl-0931/">McDonald’s in Orewa</a>

Welcome to Databricks Community: Lets learn, network and celebrate together

Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections. 

Click here to register and join today! 

Engage in exciting technical discussions, join a group with your peers and meet our Featured Members.