cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
cancel
Showing results for 
Search instead for 
Did you mean: 

How to pass column names in selectExpr through one or more string parameters in spark using scala?

SwapanSwapandee
New Contributor II

I am using script for CDC Merge in spark streaming. I wish to pass column values in selectExpr through a parameter as column names for each table would change. When I pass the columns and struct field through a string variable, I am getting error as ==> mismatched input ',' expecting

Below is the piece of code I am trying to parameterize.

var filteredMicroBatchDF=microBatchOutputDF
.selectExpr("col1","col2","struct(offset,KAFKA_TS) as otherCols" )
.groupBy("col1","col2").agg(max("otherCols").as("latest"))
.selectExpr("col1","col2","latest.*")

Reference to script being emulated:

https://docs.databricks.com/_static/notebooks/merge-in-cdc.html

I have tried like below by passing column names in a variable and then reading in the selectExpr from these variables:

val keyCols ="col1","col2" 

val structCols ="struct(offset,KAFKA_TS) as otherCols"

var filteredMicroBatchDF=microBatchOutputDF.selectExpr(keyCols,structCols ).groupBy(keyCols).agg(max("otherCols").as("latest")).selectExpr(keyCols,"latest.*")

When I run the script it gives me error as

org.apache.spark.sql.streaming.StreamingQueryException:
mismatched input ',' expecting <<EOF>>

2 REPLIES 2

shyam_9
Valued Contributor
Valued Contributor

Hi @Swapan Swapandeep Marwaha,

Can you pass them as a Seq as in below code,

keyCols = Seq("col1", "col2"), structCols = Seq("struct(offset,KAFKA_TS) as otherCols")

Hi @shyamspr,

Yes, I tried like this and it works but the way I want is to pass the column names inside Seq by reading from a widget or a parameter file and when I do I get the error.

https://stackoverflow.com/questions/58576398/how-to-pass-column-names-in-selectexpr-through-one-or-m...

I have updated the above post in stackoverflow with the code I tried and the error I am getting. Would appreciate if you could take a look and suggest any if you have any ideas to resolve this.

Thank You!

Welcome to Databricks Community: Lets learn, network and celebrate together

Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections. 

Click here to register and join today! 

Engage in exciting technical discussions, join a group with your peers and meet our Featured Members.