I have a Databricks 5.3 cluster on Azure which runs Apache Spark 2.4.0 and Scala 2.11.
I'm trying to parse a CSV file with a custom timestamp format but I don't know which datetime pattern format Spark uses.
My CSV looks like this:
Timestamp, Name, Value
02/07/2019 14:51:32.869-08:00, BatteryA, 0.25
02/07/2019 14:55:45.343-08:00, BatteryB, 0.50
02/07/2019 14:58:25.845-08:00, BatteryC, 0.34
I'm executing the following to read it:val csvDataFrame = sqlContext.read.format("csv") .option("header", "true") .option("treatEmptyValuesAsNulls", "true") .option("inferSchema", "true") .option("mode","DROPMALFORMED") .option("timestampFormat", "MM/dd/yyyy HH:mm:ss.SSSZZ") .load("path/to/file.csv")
csvDataFrame.printSchema()
But no matter what timestamp pattern I use, the first column is always inferred as string.
csvDataFrame:org.apache.spark.sql.DataFrame
Timestamp:string
Name:string
Value:double
I'm not a Java/Scala developer and I'm new to Spark/Databricks. I can't find anywhere which datetime formatter does Spark use to parse the values.