How to extract year and week number from a columns in a sparkDataFrame?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
09-20-2016 12:48 AM
I have the following sparkdataframe :
sale_id/ created_at
1 /2016-05-28T05:53:31.042Z
2 /2016-05-30T12:50:58.184Z
3/ 2016-05-23T10:22:18.858Z
4 /2016-05-27T09:20:15.158Z
5 /2016-05-21T08:30:17.337Z
6 /2016-05-28T07:41:14.361Z
i need t add a year-week columns where it contains year and week number of each row in created_at column:
sale_id/ created_at /year_week
1 /2016-05-28T05:53:31.042Z /2016-21
2 /2016-05-30T12:50:58.184Z /2016-22
3/ 2016-05-23T10:22:18.858Z /2016-21
4 /2016-05-27T09:20:15.158Z /2016-21
5 /2016-05-21T08:30:17.337Z /2016-20
6 /2016-05-28T07:41:14.361Z /2016-21
Both pyspark pr SparkR or sparkSql are desirable, i have already tried lubridate package but as my columns are S4 i receive the follwing error:
Error in as.Date.default(head_df$created_at) :
Error in as.Date.default(head_df$created_at) :
do not know how to convert 'head_df$created_at' to class “Date”
- Labels:
-
Pyspark
-
Spark
-
Sparkdataframe
-
Sparkr
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
12-19-2016 08:36 AM
val data = spark.read.option("header","true").option("inferSchema","true").csv("location of file")
import spark.implicits._
//creates year column
val year = data.withColumn("Year",year(data("created_at")))
//creates weekof year column
val week = data.withColumn("Week",weekofyear(data("created_at")))
// concatenate year and week columns
val new_df = year + "-" +week
new_df.show()
//NOTE CODE IN SCALA. I DID NOT TEST THIS CODE ON IDE, BUT IT SHOULD WORK FINE.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
12-19-2016 08:45 AM
THIS IS HOW HE DOCUMENTATION LOOKS LIKE