โ11-18-2017 10:58 AM
I have a dataset with one column of string type ('2014/12/31 18:00:36'). How can I convert it to timastamp type with PySpark?
โ11-19-2017 01:26 PM
I am trying to do it in this way, however, the result is null.
df2 = df.select(col('starting_timestamp'), df.starting_timestamp.cast('timestamp').alias('time'))
+-------------------+----+
| starting_timestamp|time|
+-------------------+----+
|2015/01/01 03:00:36|null|
|2015/01/01 03:01:06|null|
|2015/01/01 03:01:12|null|
|2015/01/01 03:01:20|null|
|2015/01/01 03:01:27|null|
+-------------------+----+
only showing top 5 rows
โ11-19-2017 01:42 PM
I found the solution. It is as follows:
df2 = df.select('ID', 'starting_timestamp', unix_timestamp('starting_timestamp', "yyyy/MM/dd HH:mm:ss") .cast(TimestampType()).alias("timestamp"))
โ03-25-2019 12:21 AM
Hi
Iam facing the same problem with the Pyspark where iam getting null after change to timestamp.The data set similar to above with some additional column
df2 = df.select('Customer', 'Transaction_Timestamp','Transaction_Base_Point_Value', unix_timestamp('Transaction_Timestamp', "yyyy/MM/dd HH:mm:ss") .cast(TimestampType()).alias("timestamp"))
|-- Customer: string (nullable = true)
|-- Transaction_Timestamp: string (nullable = true)
|-- Transaction_Base_Point_Value: integer (nullable = true)
|-- timestamp: timestamp (nullable = true)
But output of timestamp column return null
โ03-26-2019 05:29 AM
Hi,
It is strange that it returns null. It works fine for me in pyspark as well. Could you please compare the code? Also try displaying the earlier dataframe. pls make sure that the values in original dataframe are displaying properly and are in appropriate datatypes (StringType).
```
from pyspark.sql.functions import unix_timestamp, colfrom pyspark.sql.types import TimestampType
from pyspark.sql.types import StringType
df = spark.createDataFrame(["2015/01/01 03:00:36"], StringType()).toDF("ts_string")
df1 = df.select(unix_timestamp(df.ts_string, 'yyyy/MM/dd HH:mm:ss').cast(TimestampType()).alias("timestamp"))
df1.show()
```
If it still doesn't resolve, please share the full code, including how you are creating the original dataframe. Please let us know how it goes.
Thanks
โ05-08-2019 06:49 AM
Hi
i have Spark 1.6.0 on Cloudera 5.13.0
i have the same problem and this is my full code , please help me
this is the format of my row : 25/Jan/2016:21:26:37 +0100
from pyspark.sql import HiveContext
from pyspark.sql.functions import unix_timestamp, col
from pyspark.sql.types import TimestampType
from pyspark.sql.types import StringType
SQLContext = HiveContext(sc)
df=sqlContext.sql("select * from test.test")
df1 = df.select(unix_timestamp(df.date_hour, 'yyyy/MM/dd:HH:mm:ss').cast(TimestampType()).alias("timestamp"))
df1.show()
it still null
โ09-19-2019 02:31 AM
hope you dont mind if i ask you to elaborate further for a shaper understanding? see my basketball court layout at https://www.recreationtipsy.com/basketball-court/
Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you wonโt want to miss the chance to attend and share knowledge.
If there isnโt a group near you, start one and help create a community that brings people together.
Request a New Group