cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

How to pass column names in selectExpr through one or more string parameters in spark using scala?

SwapanSwapandee
New Contributor II

I am using script for CDC Merge in spark streaming. I wish to pass column values in selectExpr through a parameter as column names for each table would change. When I pass the columns and struct field through a string variable, I am getting error as ==> mismatched input ',' expecting

Below is the piece of code I am trying to parameterize.

var filteredMicroBatchDF=microBatchOutputDF
.selectExpr("col1","col2","struct(offset,KAFKA_TS) as otherCols" )
.groupBy("col1","col2").agg(max("otherCols").as("latest"))
.selectExpr("col1","col2","latest.*")

Reference to script being emulated:

https://docs.databricks.com/_static/notebooks/merge-in-cdc.html

I have tried like below by passing column names in a variable and then reading in the selectExpr from these variables:

val keyCols ="col1","col2" 

val structCols ="struct(offset,KAFKA_TS) as otherCols"

var filteredMicroBatchDF=microBatchOutputDF.selectExpr(keyCols,structCols ).groupBy(keyCols).agg(max("otherCols").as("latest")).selectExpr(keyCols,"latest.*")

When I run the script it gives me error as

org.apache.spark.sql.streaming.StreamingQueryException:
mismatched input ',' expecting <<EOF>>

2 REPLIES 2

shyam_9
Databricks Employee
Databricks Employee

Hi @Swapan Swapandeep Marwaha,

Can you pass them as a Seq as in below code,

keyCols = Seq("col1", "col2"), structCols = Seq("struct(offset,KAFKA_TS) as otherCols")

Hi @shyamspr,

Yes, I tried like this and it works but the way I want is to pass the column names inside Seq by reading from a widget or a parameter file and when I do I get the error.

https://stackoverflow.com/questions/58576398/how-to-pass-column-names-in-selectexpr-through-one-or-m...

I have updated the above post in stackoverflow with the code I tried and the error I am getting. Would appreciate if you could take a look and suggest any if you have any ideas to resolve this.

Thank You!

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group