โ11-18-2017 10:58 AM
I have a dataset with one column of string type ('2014/12/31 18:00:36'). How can I convert it to timastamp type with PySpark?
โ11-19-2017 01:26 PM
I am trying to do it in this way, however, the result is null.
df2 = df.select(col('starting_timestamp'), df.starting_timestamp.cast('timestamp').alias('time'))
+-------------------+----+
| starting_timestamp|time|
+-------------------+----+
|2015/01/01 03:00:36|null|
|2015/01/01 03:01:06|null|
|2015/01/01 03:01:12|null|
|2015/01/01 03:01:20|null|
|2015/01/01 03:01:27|null|
+-------------------+----+
only showing top 5 rows
โ11-19-2017 01:42 PM
I found the solution. It is as follows:
df2 = df.select('ID', 'starting_timestamp', unix_timestamp('starting_timestamp', "yyyy/MM/dd HH:mm:ss") .cast(TimestampType()).alias("timestamp"))
โ03-25-2019 12:21 AM
Hi
Iam facing the same problem with the Pyspark where iam getting null after change to timestamp.The data set similar to above with some additional column
df2 = df.select('Customer', 'Transaction_Timestamp','Transaction_Base_Point_Value', unix_timestamp('Transaction_Timestamp', "yyyy/MM/dd HH:mm:ss") .cast(TimestampType()).alias("timestamp"))
|-- Customer: string (nullable = true)
|-- Transaction_Timestamp: string (nullable = true)
|-- Transaction_Base_Point_Value: integer (nullable = true)
|-- timestamp: timestamp (nullable = true)
But output of timestamp column return null
โ03-26-2019 05:29 AM
Hi,
It is strange that it returns null. It works fine for me in pyspark as well. Could you please compare the code? Also try displaying the earlier dataframe. pls make sure that the values in original dataframe are displaying properly and are in appropriate datatypes (StringType).
```
from pyspark.sql.functions import unix_timestamp, colfrom pyspark.sql.types import TimestampType
from pyspark.sql.types import StringType
df = spark.createDataFrame(["2015/01/01 03:00:36"], StringType()).toDF("ts_string")
df1 = df.select(unix_timestamp(df.ts_string, 'yyyy/MM/dd HH:mm:ss').cast(TimestampType()).alias("timestamp"))
df1.show()
```
If it still doesn't resolve, please share the full code, including how you are creating the original dataframe. Please let us know how it goes.
Thanks
โ05-08-2019 06:49 AM
Hi
i have Spark 1.6.0 on Cloudera 5.13.0
i have the same problem and this is my full code , please help me
this is the format of my row : 25/Jan/2016:21:26:37 +0100
from pyspark.sql import HiveContext
from pyspark.sql.functions import unix_timestamp, col
from pyspark.sql.types import TimestampType
from pyspark.sql.types import StringType
SQLContext = HiveContext(sc)
df=sqlContext.sql("select * from test.test")
df1 = df.select(unix_timestamp(df.date_hour, 'yyyy/MM/dd:HH:mm:ss').cast(TimestampType()).alias("timestamp"))
df1.show()
it still null
โ09-19-2019 02:31 AM
hope you dont mind if i ask you to elaborate further for a shaper understanding? see my basketball court layout at https://www.recreationtipsy.com/basketball-court/
Excited to expand your horizons with us? Click here to Register and begin your journey to success!
Already a member? Login and join your local regional user group! If there isn’t one near you, fill out this form and we’ll create one for you to join!