<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Split a row into multiple rows based on a column value in Spark SQL in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/split-a-row-into-multiple-rows-based-on-a-column-value-in-spark/m-p/28093#M19931</link>
    <description>&lt;P&gt;&lt;/P&gt;
&lt;P&gt;Hi,&lt;/P&gt;
&lt;P&gt;I am trying to split a record in a table to 2 records based on a column value. Please refer to the sample below. The input table displays the 3 types of Product and their price. Notice that for a specific Product (row) only its corresponding column has value. The other columns have Null.&lt;/P&gt;
&lt;P&gt;My requirement is - whenever the &lt;B&gt;Product &lt;/B&gt;column value (in a row) is composite (i.e. has more than one product, e.g. Bolt + Brush), the record must be split into two rows - 1 row each for the composite product types. So, in this example, notice how the 2nd row gets split into 2 rows -&amp;gt; 1 row for "Bolt" and another for the "Brush", with their Price extracted from their corresponding columns (i.e in this case, "Bolt" = $3.99 and "Brush" = $6.99)&lt;/P&gt;
&lt;P&gt;&lt;U&gt;Note&lt;/U&gt;: For composite product values there can be at most 2 products as shown in this example (e.g. Bolt + Brush)&lt;/P&gt;
&lt;P&gt;&lt;B&gt;&lt;U&gt;Input&lt;/U&gt;:&lt;/B&gt;&lt;/P&gt;CustIdProductHammerBoltBrush1234Hammer$5.99
&lt;I&gt;Null&lt;/I&gt;
&lt;I&gt;Null&lt;/I&gt;
&lt;B&gt;7639&lt;/B&gt;
&lt;B&gt;Bolt + Brush&lt;/B&gt;
&lt;I&gt;Null&lt;/I&gt;
&lt;B&gt;$3.99&lt;/B&gt;
&lt;B&gt;$6.99&lt;/B&gt;6322Brush
&lt;I&gt;Null&lt;/I&gt;
&lt;I&gt;NULL&lt;/I&gt;
&lt;P&gt;$6.99&lt;/P&gt;
&lt;P&gt;&lt;B&gt;&lt;U&gt;Required Output:&lt;/U&gt;&lt;/B&gt;&lt;/P&gt;CustIdProductPrice1234Hammer$5.99
&lt;B&gt;7639&lt;/B&gt;
&lt;B&gt;Bolts&lt;/B&gt;
&lt;B&gt;$3.99&lt;/B&gt;
&lt;B&gt;7639&lt;/B&gt;
&lt;B&gt;Brush&lt;/B&gt;
&lt;B&gt;$6.99&lt;/B&gt;6322Brush$6.99
&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;Can anyone kindly help me solve the same. This has to be solved by &lt;B&gt;Spark-SQL only&lt;/B&gt;.
&lt;P&gt;Regards&lt;/P&gt; 
&lt;P&gt;&lt;/P&gt;</description>
    <pubDate>Thu, 25 Apr 2019 16:43:45 GMT</pubDate>
    <dc:creator>rishigc</dc:creator>
    <dc:date>2019-04-25T16:43:45Z</dc:date>
    <item>
      <title>Split a row into multiple rows based on a column value in Spark SQL</title>
      <link>https://community.databricks.com/t5/data-engineering/split-a-row-into-multiple-rows-based-on-a-column-value-in-spark/m-p/28093#M19931</link>
      <description>&lt;P&gt;&lt;/P&gt;
&lt;P&gt;Hi,&lt;/P&gt;
&lt;P&gt;I am trying to split a record in a table to 2 records based on a column value. Please refer to the sample below. The input table displays the 3 types of Product and their price. Notice that for a specific Product (row) only its corresponding column has value. The other columns have Null.&lt;/P&gt;
&lt;P&gt;My requirement is - whenever the &lt;B&gt;Product &lt;/B&gt;column value (in a row) is composite (i.e. has more than one product, e.g. Bolt + Brush), the record must be split into two rows - 1 row each for the composite product types. So, in this example, notice how the 2nd row gets split into 2 rows -&amp;gt; 1 row for "Bolt" and another for the "Brush", with their Price extracted from their corresponding columns (i.e in this case, "Bolt" = $3.99 and "Brush" = $6.99)&lt;/P&gt;
&lt;P&gt;&lt;U&gt;Note&lt;/U&gt;: For composite product values there can be at most 2 products as shown in this example (e.g. Bolt + Brush)&lt;/P&gt;
&lt;P&gt;&lt;B&gt;&lt;U&gt;Input&lt;/U&gt;:&lt;/B&gt;&lt;/P&gt;CustIdProductHammerBoltBrush1234Hammer$5.99
&lt;I&gt;Null&lt;/I&gt;
&lt;I&gt;Null&lt;/I&gt;
&lt;B&gt;7639&lt;/B&gt;
&lt;B&gt;Bolt + Brush&lt;/B&gt;
&lt;I&gt;Null&lt;/I&gt;
&lt;B&gt;$3.99&lt;/B&gt;
&lt;B&gt;$6.99&lt;/B&gt;6322Brush
&lt;I&gt;Null&lt;/I&gt;
&lt;I&gt;NULL&lt;/I&gt;
&lt;P&gt;$6.99&lt;/P&gt;
&lt;P&gt;&lt;B&gt;&lt;U&gt;Required Output:&lt;/U&gt;&lt;/B&gt;&lt;/P&gt;CustIdProductPrice1234Hammer$5.99
&lt;B&gt;7639&lt;/B&gt;
&lt;B&gt;Bolts&lt;/B&gt;
&lt;B&gt;$3.99&lt;/B&gt;
&lt;B&gt;7639&lt;/B&gt;
&lt;B&gt;Brush&lt;/B&gt;
&lt;B&gt;$6.99&lt;/B&gt;6322Brush$6.99
&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;Can anyone kindly help me solve the same. This has to be solved by &lt;B&gt;Spark-SQL only&lt;/B&gt;.
&lt;P&gt;Regards&lt;/P&gt; 
&lt;P&gt;&lt;/P&gt;</description>
      <pubDate>Thu, 25 Apr 2019 16:43:45 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/split-a-row-into-multiple-rows-based-on-a-column-value-in-spark/m-p/28093#M19931</guid>
      <dc:creator>rishigc</dc:creator>
      <dc:date>2019-04-25T16:43:45Z</dc:date>
    </item>
    <item>
      <title>Re: Split a row into multiple rows based on a column value in Spark SQL</title>
      <link>https://community.databricks.com/t5/data-engineering/split-a-row-into-multiple-rows-based-on-a-column-value-in-spark/m-p/28094#M19932</link>
      <description>&lt;P&gt;&lt;/P&gt;
&lt;P&gt;Hi @rishigc&lt;/P&gt;
&lt;P&gt;You can use something like below.&lt;/P&gt;
&lt;PRE&gt;&lt;CODE&gt;SELECT explode(arrays_zip(split(Product, '+'), split(Price, '+') ) as product_and_price from df
&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;or&lt;/P&gt;
&lt;PRE&gt;&lt;CODE&gt;df.withColumn("product_and_price", explode(arrays_zip(split(Product, '+'), split(Price, '+'))).select(
  $"CustId", $"prodAndPrice.Product", $"prodAndPrice.Price").show()&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;Here df is the dataframe. &lt;PRE&gt;&lt;CODE&gt;split&lt;/CODE&gt;&lt;/PRE&gt; function splits the column into array of products &amp;amp; array of prices. These 2 arrays will be merged by &lt;PRE&gt;&lt;CODE&gt;arrays_zip&lt;/CODE&gt;&lt;/PRE&gt;, so that Nth product will be mapped to Nth price. Then the merged array is exploded using &lt;PRE&gt;&lt;CODE&gt;explode&lt;/CODE&gt;&lt;/PRE&gt;, so that each element in the array becomes a separate row. &lt;/P&gt;
&lt;P&gt;please let us know if it works.&lt;/P&gt;
&lt;P&gt;Thanks&lt;/P&gt; 
&lt;P&gt;&lt;/P&gt;</description>
      <pubDate>Fri, 26 Apr 2019 10:31:30 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/split-a-row-into-multiple-rows-based-on-a-column-value-in-spark/m-p/28094#M19932</guid>
      <dc:creator>mathan_pillai</dc:creator>
      <dc:date>2019-04-26T10:31:30Z</dc:date>
    </item>
  </channel>
</rss>

