<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic How to pass column names in selectExpr through one or more string parameters in spark using scala? in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/how-to-pass-column-names-in-selectexpr-through-one-or-more/m-p/27650#M19511</link>
    <description>&lt;P&gt;&lt;/P&gt;
&lt;P&gt;I am using script for CDC Merge in spark streaming. I wish to pass column values in selectExpr through a parameter as column names for each table would change. When I pass the columns and struct field through a string variable, I am getting error as ==&amp;gt; mismatched input ',' expecting&lt;/P&gt;
&lt;P&gt;Below is the piece of code I am trying to parameterize.&lt;/P&gt;
&lt;PRE&gt;&lt;CODE&gt;var filteredMicroBatchDF=microBatchOutputDF
.selectExpr("col1","col2","struct(offset,KAFKA_TS) as otherCols" )
.groupBy("col1","col2").agg(max("otherCols").as("latest"))
.selectExpr("col1","col2","latest.*")
&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;Reference to script being emulated: &lt;/P&gt;
&lt;P&gt;&lt;A href="https://docs.databricks.com/_static/notebooks/merge-in-cdc.html" target="test_blank"&gt;https://docs.databricks.com/_static/notebooks/merge-in-cdc.html&lt;/A&gt;&lt;/P&gt;
&lt;P&gt;I have tried like below by passing column names in a variable and then reading in the selectExpr from these variables: &lt;/P&gt;
&lt;P&gt;&lt;PRE&gt;&lt;CODE&gt;val keyCols ="col1","col2" &lt;/CODE&gt;&lt;/PRE&gt;&lt;/P&gt;
&lt;P&gt;&lt;PRE&gt;&lt;CODE&gt;val structCols ="struct(offset,KAFKA_TS) as otherCols"&lt;/CODE&gt;&lt;/PRE&gt;&lt;/P&gt;
&lt;P&gt;&lt;PRE&gt;&lt;CODE&gt;var filteredMicroBatchDF=microBatchOutputDF.selectExpr(keyCols,structCols ).groupBy(keyCols).agg(max("otherCols").as("latest")).selectExpr(keyCols,"latest.*")&lt;/CODE&gt;&lt;/PRE&gt;&lt;/P&gt;
&lt;P&gt;When I run the script it gives me error as &lt;PRE&gt;&lt;CODE&gt;org.apache.spark.sql.streaming.StreamingQueryException:&lt;/CODE&gt;&lt;/PRE&gt;&lt;PRE&gt;&lt;CODE&gt;mismatched input ',' expecting &amp;lt;&amp;lt;EOF&amp;gt;&amp;gt;&lt;/CODE&gt;&lt;/PRE&gt;&lt;/P&gt; 
&lt;P&gt;&lt;/P&gt;</description>
    <pubDate>Sun, 27 Oct 2019 03:28:02 GMT</pubDate>
    <dc:creator>SwapanSwapandee</dc:creator>
    <dc:date>2019-10-27T03:28:02Z</dc:date>
    <item>
      <title>How to pass column names in selectExpr through one or more string parameters in spark using scala?</title>
      <link>https://community.databricks.com/t5/data-engineering/how-to-pass-column-names-in-selectexpr-through-one-or-more/m-p/27650#M19511</link>
      <description>&lt;P&gt;&lt;/P&gt;
&lt;P&gt;I am using script for CDC Merge in spark streaming. I wish to pass column values in selectExpr through a parameter as column names for each table would change. When I pass the columns and struct field through a string variable, I am getting error as ==&amp;gt; mismatched input ',' expecting&lt;/P&gt;
&lt;P&gt;Below is the piece of code I am trying to parameterize.&lt;/P&gt;
&lt;PRE&gt;&lt;CODE&gt;var filteredMicroBatchDF=microBatchOutputDF
.selectExpr("col1","col2","struct(offset,KAFKA_TS) as otherCols" )
.groupBy("col1","col2").agg(max("otherCols").as("latest"))
.selectExpr("col1","col2","latest.*")
&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;Reference to script being emulated: &lt;/P&gt;
&lt;P&gt;&lt;A href="https://docs.databricks.com/_static/notebooks/merge-in-cdc.html" target="test_blank"&gt;https://docs.databricks.com/_static/notebooks/merge-in-cdc.html&lt;/A&gt;&lt;/P&gt;
&lt;P&gt;I have tried like below by passing column names in a variable and then reading in the selectExpr from these variables: &lt;/P&gt;
&lt;P&gt;&lt;PRE&gt;&lt;CODE&gt;val keyCols ="col1","col2" &lt;/CODE&gt;&lt;/PRE&gt;&lt;/P&gt;
&lt;P&gt;&lt;PRE&gt;&lt;CODE&gt;val structCols ="struct(offset,KAFKA_TS) as otherCols"&lt;/CODE&gt;&lt;/PRE&gt;&lt;/P&gt;
&lt;P&gt;&lt;PRE&gt;&lt;CODE&gt;var filteredMicroBatchDF=microBatchOutputDF.selectExpr(keyCols,structCols ).groupBy(keyCols).agg(max("otherCols").as("latest")).selectExpr(keyCols,"latest.*")&lt;/CODE&gt;&lt;/PRE&gt;&lt;/P&gt;
&lt;P&gt;When I run the script it gives me error as &lt;PRE&gt;&lt;CODE&gt;org.apache.spark.sql.streaming.StreamingQueryException:&lt;/CODE&gt;&lt;/PRE&gt;&lt;PRE&gt;&lt;CODE&gt;mismatched input ',' expecting &amp;lt;&amp;lt;EOF&amp;gt;&amp;gt;&lt;/CODE&gt;&lt;/PRE&gt;&lt;/P&gt; 
&lt;P&gt;&lt;/P&gt;</description>
      <pubDate>Sun, 27 Oct 2019 03:28:02 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/how-to-pass-column-names-in-selectexpr-through-one-or-more/m-p/27650#M19511</guid>
      <dc:creator>SwapanSwapandee</dc:creator>
      <dc:date>2019-10-27T03:28:02Z</dc:date>
    </item>
    <item>
      <title>Re: How to pass column names in selectExpr through one or more string parameters in spark using scala?</title>
      <link>https://community.databricks.com/t5/data-engineering/how-to-pass-column-names-in-selectexpr-through-one-or-more/m-p/27651#M19512</link>
      <description>&lt;P&gt;&lt;/P&gt;&lt;P&gt;Hi @Swapan Swapandeep Marwaha, &lt;/P&gt;&lt;P&gt;Can you pass them as a Seq as in below code, &lt;/P&gt;&lt;PRE&gt;&lt;CODE&gt;keyCols = Seq("col1", "col2"), structCols = Seq("struct(offset,KAFKA_TS) as otherCols")&lt;/CODE&gt;&lt;/PRE&gt;</description>
      <pubDate>Tue, 29 Oct 2019 05:40:48 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/how-to-pass-column-names-in-selectexpr-through-one-or-more/m-p/27651#M19512</guid>
      <dc:creator>shyam_9</dc:creator>
      <dc:date>2019-10-29T05:40:48Z</dc:date>
    </item>
    <item>
      <title>Re: How to pass column names in selectExpr through one or more string parameters in spark using scala?</title>
      <link>https://community.databricks.com/t5/data-engineering/how-to-pass-column-names-in-selectexpr-through-one-or-more/m-p/27652#M19513</link>
      <description>&lt;P&gt;Hi @shyamspr,&lt;/P&gt;&lt;P&gt;Yes, I tried like this and it works but the way I want is to pass the column names inside Seq by reading from a widget or a parameter file and when I do I get the error. &lt;/P&gt;&lt;P&gt;&lt;A href="https://stackoverflow.com/questions/58576398/how-to-pass-column-names-in-selectexpr-through-one-or-more-string-parameters-in" alt="https://stackoverflow.com/questions/58576398/how-to-pass-column-names-in-selectexpr-through-one-or-more-string-parameters-in" target="_blank"&gt;https://stackoverflow.com/questions/58576398/how-to-pass-column-names-in-selectexpr-through-one-or-more-string-parameters-in&lt;/A&gt;&lt;/P&gt;&lt;P&gt;I have updated the above post in stackoverflow with the code I tried and the error I am getting. Would appreciate if you could take a look and suggest any if you have any ideas to resolve this. &lt;/P&gt;&lt;P&gt;Thank You!&lt;/P&gt;&lt;P&gt;&lt;/P&gt;</description>
      <pubDate>Wed, 30 Oct 2019 01:11:58 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/how-to-pass-column-names-in-selectexpr-through-one-or-more/m-p/27652#M19513</guid>
      <dc:creator>SwapanSwapandee</dc:creator>
      <dc:date>2019-10-30T01:11:58Z</dc:date>
    </item>
  </channel>
</rss>

