cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

How to append new column values in dataframe behalf of unique id's

supriya
New Contributor II

I need to create new column with data in dataframe.

Example:

val test = sqlContext.createDataFrame(Seq( (4L, "spark i j k"), (5L, "l m n"), (6L, "mapreduce spark"), (7L, "apache hadoop"), (11L, "a b c d e spark"), (12L, "b d"), (13L, "spark f g h"), (14L, "hadoop mapreduce"))).toDF("id", "text")

val tuples = List((0L, 0.9), (4L, 3.0),(6L, 0.12), (7L, 0.7), (11L, 0.15), (12L, 6.1), (13L, 1.8)) val rdd: RDD[(Long, Double)] = sparkContext.parallelize((tuples.toSeq))

This tuples value is ID and AVERAGE. Now I want to add new column named Average and add value for all the rows behalf of ID and genrate a new Dataframe or RDD.

12 REPLIES 12

raela
Databricks Employee
Databricks Employee

Are you trying to add a new column to tuples?

You would first have to convert tuples into a DataFrame, and this can be easily done:

val tuplesDF = tuples.toDF("id", "average")

Then you can use withColumn to create a new column:

tuplesDF.withColumn("average2", tuplesDF.col("average") + 10)

Refer to the DataFrame documentation here:

https://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.sql.DataFrame

supriya
New Contributor II

Thanx @Raela Wangโ€‹  . But my requirement is different, i want to add Average column in test dataframe behalf of id column. I know this one is possible using join ...but I think join process is too slow. If you have any other solution then you can suggest me.

you have given the method to copy the values of an existing column to a newly created column, but @supriyaโ€‹  has asked a different question.

raela
Databricks Employee
Databricks Employee

@supriya

you will have to do a join.

import org.apache.spark.sql.functions._
val joined = test.join(tuples, col("id") === col("tupleid"), "inner").select("id", "text", "average")

jackAKAkarthik
New Contributor III

@Raela Wangโ€‹  how can i add a timestamp to every row in the dataframe dynamically.

val date = new java.util.Date

val AppendDF = existingDF.withColumn("new_column_name",Column date)

Is not working for me.

Can you help?

@jack AKA karthik: For adding a timestamp in dataframe dynamically:

import org.apache.spark.sql.functions._
val AppendDF = customerDF.withColumn("new_column_name",current_timestamp())

I think it's work for you.

@supriyaโ€‹ 

thanks for the help. It worked.

@supriyaโ€‹ 

how can i cast this current_timestamp() in to a string type as my hive version is lower(0.13) and not able to load time stamp in to the table as it is.

@Raela Wangโ€‹ 

How can i convert current_timestamp() to a string in scala, I have tried a few but no luck.

raela
Databricks Employee
Databricks Employee

@jack karthik What have you tried? Have you tried cast()?

https://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.sql.Column

df.select(df("colA").cast("string"))

jackAKAkarthik
New Contributor III

@Raela Wangโ€‹ 

yes i used this after i posted the question, forgot to update.

jackAKAkarthik
New Contributor III

@Raela Wangโ€‹ 

I have used

val new DF = dataframe.withColumn("Timestamp_val",current_timestamp())

added a new column to an existing dataframe, but the compile is throwing errors while running it with yarn,

java.lang.IllegalArgumentException: requirement failed
        at scala.Predef$.require(Predef.scala:221)
        at org.apache.spark.sql.catalyst.analysis.UnresolvedStar.expand(unresolved.scala:199)

How else can we add a column, should we not create a new dataframe while adding the column?

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you wonโ€™t want to miss the chance to attend and share knowledge.

If there isnโ€™t a group near you, start one and help create a community that brings people together.

Request a New Group