cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

Convert String to Timestamp

vaio
New Contributor II

I have a dataset with one column of string type ('2014/12/31 18:00:36'). How can I convert it to timastamp type with PySpark?

6 REPLIES 6

vaio
New Contributor II

I am trying to do it in this way, however, the result is null.

df2 = df.select(col('starting_timestamp'), df.starting_timestamp.cast('timestamp').alias('time'))

+-------------------+----+
| starting_timestamp|time|
+-------------------+----+
|2015/01/01 03:00:36|null|
|2015/01/01 03:01:06|null|
|2015/01/01 03:01:12|null|
|2015/01/01 03:01:20|null|
|2015/01/01 03:01:27|null|
+-------------------+----+
only showing top 5 rows

vaio
New Contributor II

I found the solution. It is as follows:

df2 = df.select('ID', 'starting_timestamp', unix_timestamp('starting_timestamp', "yyyy/MM/dd HH:mm:ss") .cast(TimestampType()).alias("timestamp"))

hariumesh
New Contributor II

Hi

Iam facing the same problem with the Pyspark where iam getting null after change to timestamp.The data set similar to above with some additional column

df2 = df.select('Customer', 'Transaction_Timestamp','Transaction_Base_Point_Value', unix_timestamp('Transaction_Timestamp', "yyyy/MM/dd HH:mm:ss") .cast(TimestampType()).alias("timestamp"))
|-- Customer: string (nullable = true)
 |-- Transaction_Timestamp: string (nullable = true)
 |-- Transaction_Base_Point_Value: integer (nullable = true)
 |-- timestamp: timestamp (nullable = true)
 
But output of timestamp column return null

mathan_pillai
Valued Contributor

Hi,

It is strange that it returns null. It works fine for me in pyspark as well. Could you please compare the code? Also try displaying the earlier dataframe. pls make sure that the values in original dataframe are displaying properly and are in appropriate datatypes (StringType).

```

from pyspark.sql.functions import unix_timestamp, col

from pyspark.sql.types import TimestampType

from pyspark.sql.types import StringType

df = spark.createDataFrame(["2015/01/01 03:00:36"], StringType()).toDF("ts_string")

df1 = df.select(unix_timestamp(df.ts_string, 'yyyy/MM/dd HH:mm:ss').cast(TimestampType()).alias("timestamp"))

df1.show()

```

If it still doesn't resolve, please share the full code, including how you are creating the original dataframe. Please let us know how it goes.

Thanks

Doha
New Contributor II

Hi

i have Spark 1.6.0 on Cloudera 5.13.0

i have the same problem and this is my full code , please help me

this is the format of my row : 25/Jan/2016:21:26:37 +0100

from pyspark.sql import HiveContext

from pyspark.sql.functions import unix_timestamp, col

from pyspark.sql.types import TimestampType

from pyspark.sql.types import StringType

SQLContext = HiveContext(sc)

df=sqlContext.sql("select * from test.test")

df1 = df.select(unix_timestamp(df.date_hour, 'yyyy/MM/dd:HH:mm:ss').cast(TimestampType()).alias("timestamp"))

df1.show()

it still null

gideon
New Contributor II

hope you dont mind if i ask you to elaborate further for a shaper understanding? see my basketball court layout at https://www.recreationtipsy.com/basketball-court/

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you wonโ€™t want to miss the chance to attend and share knowledge.

If there isnโ€™t a group near you, start one and help create a community that brings people together.

Request a New Group