cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
cancel
Showing results for 
Search instead for 
Did you mean: 

Convert String to Timestamp

vaio
New Contributor II

I have a dataset with one column of string type ('2014/12/31 18:00:36'). How can I convert it to timastamp type with PySpark?

6 REPLIES 6

vaio
New Contributor II

I am trying to do it in this way, however, the result is null.

df2 = df.select(col('starting_timestamp'), df.starting_timestamp.cast('timestamp').alias('time'))

+-------------------+----+
| starting_timestamp|time|
+-------------------+----+
|2015/01/01 03:00:36|null|
|2015/01/01 03:01:06|null|
|2015/01/01 03:01:12|null|
|2015/01/01 03:01:20|null|
|2015/01/01 03:01:27|null|
+-------------------+----+
only showing top 5 rows

vaio
New Contributor II

I found the solution. It is as follows:

df2 = df.select('ID', 'starting_timestamp', unix_timestamp('starting_timestamp', "yyyy/MM/dd HH:mm:ss") .cast(TimestampType()).alias("timestamp"))

hariumesh
New Contributor II

Hi

Iam facing the same problem with the Pyspark where iam getting null after change to timestamp.The data set similar to above with some additional column

df2 = df.select('Customer', 'Transaction_Timestamp','Transaction_Base_Point_Value', unix_timestamp('Transaction_Timestamp', "yyyy/MM/dd HH:mm:ss") .cast(TimestampType()).alias("timestamp"))
|-- Customer: string (nullable = true)
 |-- Transaction_Timestamp: string (nullable = true)
 |-- Transaction_Base_Point_Value: integer (nullable = true)
 |-- timestamp: timestamp (nullable = true)
 
But output of timestamp column return null

mathan_pillai
Valued Contributor
Valued Contributor

Hi,

It is strange that it returns null. It works fine for me in pyspark as well. Could you please compare the code? Also try displaying the earlier dataframe. pls make sure that the values in original dataframe are displaying properly and are in appropriate datatypes (StringType).

```

from pyspark.sql.functions import unix_timestamp, col

from pyspark.sql.types import TimestampType

from pyspark.sql.types import StringType

df = spark.createDataFrame(["2015/01/01 03:00:36"], StringType()).toDF("ts_string")

df1 = df.select(unix_timestamp(df.ts_string, 'yyyy/MM/dd HH:mm:ss').cast(TimestampType()).alias("timestamp"))

df1.show()

```

If it still doesn't resolve, please share the full code, including how you are creating the original dataframe. Please let us know how it goes.

Thanks

Doha
New Contributor II

Hi

i have Spark 1.6.0 on Cloudera 5.13.0

i have the same problem and this is my full code , please help me

this is the format of my row : 25/Jan/2016:21:26:37 +0100

from pyspark.sql import HiveContext

from pyspark.sql.functions import unix_timestamp, col

from pyspark.sql.types import TimestampType

from pyspark.sql.types import StringType

SQLContext = HiveContext(sc)

df=sqlContext.sql("select * from test.test")

df1 = df.select(unix_timestamp(df.date_hour, 'yyyy/MM/dd:HH:mm:ss').cast(TimestampType()).alias("timestamp"))

df1.show()

it still null

gideon
New Contributor II

hope you dont mind if i ask you to elaborate further for a shaper understanding? see my basketball court layout at https://www.recreationtipsy.com/basketball-court/

Welcome to Databricks Community: Lets learn, network and celebrate together

Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections. 

Click here to register and join today! 

Engage in exciting technical discussions, join a group with your peers and meet our Featured Members.