cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

How to set the timestamp format when reading CSV

EmilianoParizz1
New Contributor

I have a Databricks 5.3 cluster on Azure which runs Apache Spark 2.4.0 and Scala 2.11.

I'm trying to parse a CSV file with a custom timestamp format but I don't know which datetime pattern format Spark uses.

My CSV looks like this:
Timestamp, Name, Value  
02/07/2019 14:51:32.869-08:00, BatteryA, 0.25  
02/07/2019 14:55:45.343-08:00, BatteryB, 0.50  
02/07/2019 14:58:25.845-08:00, BatteryC, 0.34

I'm executing the following to read it:val csvDataFrame = sqlContext.read.format("csv") .option("header", "true") .option("treatEmptyValuesAsNulls", "true") .option("inferSchema", "true") .option("mode","DROPMALFORMED") .option("timestampFormat", "MM/dd/yyyy HH:mm:ss.SSSZZ") .load("path/to/file.csv")

csvDataFrame.printSchema()

But no matter what timestamp pattern I use, the first column is always inferred as string.
csvDataFrame:org.apache.spark.sql.DataFrame
  Timestamp:string
  Name:string
  Value:double

I'm not a Java/Scala developer and I'm new to Spark/Databricks. I can't find anywhere which datetime formatter does Spark use to parse the values.

4 REPLIES 4

mekkinen
New Contributor II

At least based on the Pyspark documentation: (https://spark.apache.org/docs/latest/api/python/pyspark.sql.html#pyspark.sql.DataFrameReader) it it stated that:

"

  • dateFormat โ€“ sets the string that indicates a date format. Custom date formats follow the formats at
    java.text.SimpleDateFormat
    . This applies to date type. If None is set, it uses the default value,
    yyyy-MM-dd
    .
  • timestampFormat โ€“ sets the string that indicates a timestamp format. Custom date formats follow the formats at
    java.text.SimpleDateFormat
    . This applies to timestamp type. If None is set, it uses the default value,
    yyyy-MM-dd'T'HH:mm:ss.SSSXXX
    .

"

I would imagine that these were the same in the case of writing scala.

DonatienTessier
Contributor

Hi @Emiliano Parizzi,

You could parsed the timestamp after loading the file with using the withColumn (cf. https://stackoverflow.com/questions/39088473/pyspark-dataframe-convert-unusual-string-format-to-time....

from pyspark.sql import Row from pyspark.sql.functions import to_timestamp

(sc .parallelize([Row(dt='02/07/2019 14:51:32.869-08:00')]) .toDF() .withColumn("parsed", to_timestamp("dt", "MM/dd/yyyy HH:mm:ss.SSSXXX")) .show(1, False))

+-----------------------------+-------------------+ |dt |parsed | +-----------------------------+-------------------+ |02/07/2019 14:51:32.869-08:00|2019-02-07 22:51:32| +-----------------------------+-------------------+

SteveDocherty
New Contributor II

# in python: explicitly define the schema, read in CSV data using the schema and a defined timestamp format [and an extra column to be used for partitioning; this part is optional] csvSchema = StructType([ StructField("Timestamp",TimestampType(),True), StructField("Name",StringType(),True), StructField("Value",DoubleType(),True) ])

df = spark.read \ .csv(file_path, header = True, multiLine = True, escape = "\"", schema = csvSchema, timestampFormat = "MM/dd/yyyy HH:mm:ss.SSSZZ" ) \ .withColumn("year", date_format(col("Timestamp"), "yyyy").cast(IntegerType())) \ .withColumn("month", date_format(col("Timestamp"), "MM").cast(IntegerType()))

display(df)

wellington72019
New Contributor II

# in python: explicitly define the schema, read in CSV data using the schema and a defined timestamp format....

<a href="http://thestoreguide.co.nz/auckland/orewa/mcdonalds-orewa-akl-0931/">McDonaldโ€™s in Orewa</a>

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you wonโ€™t want to miss the chance to attend and share knowledge.

If there isnโ€™t a group near you, start one and help create a community that brings people together.

Request a New Group