<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: How to append new column values in dataframe behalf of unique id's in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/how-to-append-new-column-values-in-dataframe-behalf-of-unique-id/m-p/29946#M21633</link>
    <description>&lt;P&gt;@Raela Wang​&amp;nbsp; how can i add a timestamp to every row in the dataframe dynamically.&lt;/P&gt;&lt;P&gt;val date = new java.util.Date&lt;/P&gt;&lt;P&gt;val AppendDF = existingDF.withColumn("new_column_name",Column date)&lt;/P&gt;&lt;P&gt;Is not working for me.&lt;/P&gt;&lt;P&gt;Can you help?&lt;/P&gt;&lt;P&gt;&lt;/P&gt;</description>
    <pubDate>Mon, 09 Jan 2017 11:12:45 GMT</pubDate>
    <dc:creator>jackAKAkarthik</dc:creator>
    <dc:date>2017-01-09T11:12:45Z</dc:date>
    <item>
      <title>How to append new column values in dataframe behalf of unique id's</title>
      <link>https://community.databricks.com/t5/data-engineering/how-to-append-new-column-values-in-dataframe-behalf-of-unique-id/m-p/29941#M21628</link>
      <description>&lt;P&gt;&lt;/P&gt;
&lt;P&gt;I need to create new column with data in dataframe.&lt;/P&gt;
&lt;P&gt;Example:&lt;/P&gt;val test = sqlContext.createDataFrame(Seq( (4L, "spark i j k"), (5L, "l m n"), (6L, "mapreduce spark"), (7L, "apache hadoop"), (11L, "a b c d e spark"), (12L, "b d"), (13L, "spark f g h"), (14L, "hadoop mapreduce"))).toDF("id", "text")
&lt;P&gt;&lt;/P&gt; 
&lt;P&gt;val tuples = List((0L, 0.9), (4L, 3.0),(6L, 0.12), (7L, 0.7), (11L, 0.15), (12L, 6.1), (13L, 1.8)) val rdd: RDD[(Long, Double)] = sparkContext.parallelize((tuples.toSeq))&lt;/P&gt;
&lt;P&gt;This tuples value is ID and AVERAGE. Now I want to add new column named Average and add value for all the rows behalf of ID and genrate a new Dataframe or RDD.&lt;/P&gt;
&lt;P&gt;&lt;/P&gt;</description>
      <pubDate>Fri, 22 Jan 2016 09:47:07 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/how-to-append-new-column-values-in-dataframe-behalf-of-unique-id/m-p/29941#M21628</guid>
      <dc:creator>supriya</dc:creator>
      <dc:date>2016-01-22T09:47:07Z</dc:date>
    </item>
    <item>
      <title>Re: How to append new column values in dataframe behalf of unique id's</title>
      <link>https://community.databricks.com/t5/data-engineering/how-to-append-new-column-values-in-dataframe-behalf-of-unique-id/m-p/29942#M21629</link>
      <description>&lt;P&gt;&lt;/P&gt;
&lt;P&gt;Are you trying to add a new column to tuples?&lt;/P&gt;
&lt;P&gt;You would first have to convert tuples into a DataFrame, and this can be easily done:&lt;/P&gt;
&lt;PRE&gt;&lt;CODE&gt;val tuplesDF = tuples.toDF("id", "average")&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;Then you can use withColumn to create a new column:&lt;/P&gt;
&lt;PRE&gt;&lt;CODE&gt;tuplesDF.withColumn("average2", tuplesDF.col("average") + 10)&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;Refer to the DataFrame documentation here:&lt;/P&gt;
&lt;P&gt;&lt;A href="https://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.sql.DataFrame" target="test_blank"&gt;https://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.sql.DataFrame&lt;/A&gt;&lt;/P&gt;
&lt;P&gt;&lt;/P&gt;</description>
      <pubDate>Fri, 29 Jan 2016 17:59:56 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/how-to-append-new-column-values-in-dataframe-behalf-of-unique-id/m-p/29942#M21629</guid>
      <dc:creator>raela</dc:creator>
      <dc:date>2016-01-29T17:59:56Z</dc:date>
    </item>
    <item>
      <title>Re: How to append new column values in dataframe behalf of unique id's</title>
      <link>https://community.databricks.com/t5/data-engineering/how-to-append-new-column-values-in-dataframe-behalf-of-unique-id/m-p/29943#M21630</link>
      <description>&lt;P&gt;Thanx @Raela Wang​&amp;nbsp; . But my requirement is different, i want to add &lt;B&gt;Average&lt;/B&gt; column in test dataframe behalf of id column. I know this one is possible using join ...but I think join process is too slow. If you have any other solution then you can suggest me.&lt;/P&gt;</description>
      <pubDate>Sun, 31 Jan 2016 13:11:40 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/how-to-append-new-column-values-in-dataframe-behalf-of-unique-id/m-p/29943#M21630</guid>
      <dc:creator>supriya</dc:creator>
      <dc:date>2016-01-31T13:11:40Z</dc:date>
    </item>
    <item>
      <title>Re: How to append new column values in dataframe behalf of unique id's</title>
      <link>https://community.databricks.com/t5/data-engineering/how-to-append-new-column-values-in-dataframe-behalf-of-unique-id/m-p/29944#M21631</link>
      <description>&lt;P&gt;@supriya&lt;/P&gt;
&lt;P&gt;you will have to do a join.&lt;/P&gt;
&lt;PRE&gt;&lt;CODE&gt;import org.apache.spark.sql.functions._
val joined = test.join(tuples, col("id") === col("tupleid"), "inner").select("id", "text", "average")&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;&lt;/P&gt;</description>
      <pubDate>Wed, 03 Feb 2016 01:08:32 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/how-to-append-new-column-values-in-dataframe-behalf-of-unique-id/m-p/29944#M21631</guid>
      <dc:creator>raela</dc:creator>
      <dc:date>2016-02-03T01:08:32Z</dc:date>
    </item>
    <item>
      <title>Re: How to append new column values in dataframe behalf of unique id's</title>
      <link>https://community.databricks.com/t5/data-engineering/how-to-append-new-column-values-in-dataframe-behalf-of-unique-id/m-p/29945#M21632</link>
      <description>&lt;P&gt;you have given the method to copy the values of an existing column to a newly created column, but @supriya​&amp;nbsp; has asked a different question.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;</description>
      <pubDate>Mon, 09 Jan 2017 11:10:14 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/how-to-append-new-column-values-in-dataframe-behalf-of-unique-id/m-p/29945#M21632</guid>
      <dc:creator>jackAKAkarthik</dc:creator>
      <dc:date>2017-01-09T11:10:14Z</dc:date>
    </item>
    <item>
      <title>Re: How to append new column values in dataframe behalf of unique id's</title>
      <link>https://community.databricks.com/t5/data-engineering/how-to-append-new-column-values-in-dataframe-behalf-of-unique-id/m-p/29946#M21633</link>
      <description>&lt;P&gt;@Raela Wang​&amp;nbsp; how can i add a timestamp to every row in the dataframe dynamically.&lt;/P&gt;&lt;P&gt;val date = new java.util.Date&lt;/P&gt;&lt;P&gt;val AppendDF = existingDF.withColumn("new_column_name",Column date)&lt;/P&gt;&lt;P&gt;Is not working for me.&lt;/P&gt;&lt;P&gt;Can you help?&lt;/P&gt;&lt;P&gt;&lt;/P&gt;</description>
      <pubDate>Mon, 09 Jan 2017 11:12:45 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/how-to-append-new-column-values-in-dataframe-behalf-of-unique-id/m-p/29946#M21633</guid>
      <dc:creator>jackAKAkarthik</dc:creator>
      <dc:date>2017-01-09T11:12:45Z</dc:date>
    </item>
    <item>
      <title>Re: How to append new column values in dataframe behalf of unique id's</title>
      <link>https://community.databricks.com/t5/data-engineering/how-to-append-new-column-values-in-dataframe-behalf-of-unique-id/m-p/29947#M21634</link>
      <description>&lt;P&gt;&lt;/P&gt;
&lt;P&gt;&lt;A href="https://users/4482/jack-aka-karthik.html" target="_blank"&gt;@jack AKA karthik&lt;/A&gt;: For adding a timestamp in dataframe dynamically: &lt;/P&gt;
&lt;PRE&gt;&lt;CODE&gt;import org.apache.spark.sql.functions._
val AppendDF = customerDF.withColumn("new_column_name",current_timestamp())&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt; I think it's work for you.&lt;/P&gt; 
&lt;P&gt;&lt;/P&gt;</description>
      <pubDate>Mon, 09 Jan 2017 12:35:34 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/how-to-append-new-column-values-in-dataframe-behalf-of-unique-id/m-p/29947#M21634</guid>
      <dc:creator>supriya</dc:creator>
      <dc:date>2017-01-09T12:35:34Z</dc:date>
    </item>
    <item>
      <title>Re: How to append new column values in dataframe behalf of unique id's</title>
      <link>https://community.databricks.com/t5/data-engineering/how-to-append-new-column-values-in-dataframe-behalf-of-unique-id/m-p/29948#M21635</link>
      <description>&lt;P&gt;@supriya​&amp;nbsp;&lt;/P&gt;&lt;P&gt;thanks for the help. It worked. &lt;/P&gt;&lt;P&gt;&lt;/P&gt;</description>
      <pubDate>Mon, 09 Jan 2017 14:22:48 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/how-to-append-new-column-values-in-dataframe-behalf-of-unique-id/m-p/29948#M21635</guid>
      <dc:creator>jackAKAkarthik</dc:creator>
      <dc:date>2017-01-09T14:22:48Z</dc:date>
    </item>
    <item>
      <title>Re: How to append new column values in dataframe behalf of unique id's</title>
      <link>https://community.databricks.com/t5/data-engineering/how-to-append-new-column-values-in-dataframe-behalf-of-unique-id/m-p/29949#M21636</link>
      <description>&lt;P&gt;@supriya​&amp;nbsp;&lt;/P&gt;&lt;P&gt;how can i cast this current_timestamp() in to a string type as my hive version is lower(0.13) and not able to load time stamp in to the table as it is.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;</description>
      <pubDate>Thu, 12 Jan 2017 09:40:21 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/how-to-append-new-column-values-in-dataframe-behalf-of-unique-id/m-p/29949#M21636</guid>
      <dc:creator>jackAKAkarthik</dc:creator>
      <dc:date>2017-01-12T09:40:21Z</dc:date>
    </item>
    <item>
      <title>Re: How to append new column values in dataframe behalf of unique id's</title>
      <link>https://community.databricks.com/t5/data-engineering/how-to-append-new-column-values-in-dataframe-behalf-of-unique-id/m-p/29950#M21637</link>
      <description>&lt;P&gt;@Raela Wang​&amp;nbsp;&lt;/P&gt;&lt;P&gt;How can i convert current_timestamp() to a string in scala, I have tried a few but no luck.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;</description>
      <pubDate>Thu, 12 Jan 2017 09:49:23 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/how-to-append-new-column-values-in-dataframe-behalf-of-unique-id/m-p/29950#M21637</guid>
      <dc:creator>jackAKAkarthik</dc:creator>
      <dc:date>2017-01-12T09:49:23Z</dc:date>
    </item>
    <item>
      <title>Re: How to append new column values in dataframe behalf of unique id's</title>
      <link>https://community.databricks.com/t5/data-engineering/how-to-append-new-column-values-in-dataframe-behalf-of-unique-id/m-p/29951#M21638</link>
      <description>&lt;P&gt;&lt;/P&gt;
&lt;P&gt;@jack karthik What have you tried? Have you tried cast()? &lt;/P&gt;
&lt;P&gt;&lt;A href="https://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.sql.Column" target="test_blank"&gt;https://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.sql.Column&lt;/A&gt;&lt;/P&gt;
&lt;P&gt;df.select(df("colA").cast("string"))&lt;/P&gt; 
&lt;P&gt;&lt;/P&gt;</description>
      <pubDate>Thu, 12 Jan 2017 15:56:09 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/how-to-append-new-column-values-in-dataframe-behalf-of-unique-id/m-p/29951#M21638</guid>
      <dc:creator>raela</dc:creator>
      <dc:date>2017-01-12T15:56:09Z</dc:date>
    </item>
    <item>
      <title>Re: How to append new column values in dataframe behalf of unique id's</title>
      <link>https://community.databricks.com/t5/data-engineering/how-to-append-new-column-values-in-dataframe-behalf-of-unique-id/m-p/29952#M21639</link>
      <description>&lt;P&gt;@Raela Wang​&amp;nbsp; &lt;/P&gt;&lt;P&gt;yes i used this after i posted the question, forgot to update.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;</description>
      <pubDate>Thu, 12 Jan 2017 16:20:48 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/how-to-append-new-column-values-in-dataframe-behalf-of-unique-id/m-p/29952#M21639</guid>
      <dc:creator>jackAKAkarthik</dc:creator>
      <dc:date>2017-01-12T16:20:48Z</dc:date>
    </item>
    <item>
      <title>Re: How to append new column values in dataframe behalf of unique id's</title>
      <link>https://community.databricks.com/t5/data-engineering/how-to-append-new-column-values-in-dataframe-behalf-of-unique-id/m-p/29953#M21640</link>
      <description>&lt;P&gt;@Raela Wang​&amp;nbsp; &lt;/P&gt;&lt;P&gt;I have used &lt;/P&gt;&lt;P&gt;val new DF = dataframe.withColumn("Timestamp_val",current_timestamp())&lt;/P&gt;&lt;P&gt;added a new column to an existing dataframe, but the compile is throwing errors while running it with yarn,&lt;/P&gt;&lt;PRE&gt;&lt;CODE&gt;java.lang.IllegalArgumentException: requirement failed
        at scala.Predef$.require(Predef.scala:221)
        at org.apache.spark.sql.catalyst.analysis.UnresolvedStar.expand(unresolved.scala:199)&lt;/CODE&gt;&lt;/PRE&gt;&lt;P&gt;How else can we add a column, should we not create a new dataframe while adding the column?&lt;/P&gt;&lt;P&gt;&lt;/P&gt;</description>
      <pubDate>Tue, 17 Jan 2017 11:20:42 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/how-to-append-new-column-values-in-dataframe-behalf-of-unique-id/m-p/29953#M21640</guid>
      <dc:creator>jackAKAkarthik</dc:creator>
      <dc:date>2017-01-17T11:20:42Z</dc:date>
    </item>
  </channel>
</rss>

