Databricks Community

dineshg · ‎10-27-2022

I need to execute union statement which is framed dynamically and stored in string variable. I framed the union statement, but struck with executing the statement. Does anyone know how to execute union statement stored in string variable? I'm using pyspark in databricks notebook.

df1 = df.filter((col("vchDataSection") == "AccountMasterInfo") & (col("bActive") == 1)).withColumn("dfs", concat(lit(".union(df"), col("iRuleid"), lit(")")))

df2 = df1.agg(concat_ws("",collect_list(col("dfs")))).withColumnRenamed("concat_ws(, collect_list(dfs))", "AccInfoRules").withColumn("replacestr",lit(""))

df3 = df2.select(overlay("AccInfoRules","replacestr",1,7).alias("overlayed"))

var_a = df3.collect()

var_a = var_a[0].__getitem__('overlayed')

var_b = var_a.replace(')', '', 1)

print(var_b)

o/p: df533.union(df534).union(df535).union(df536)

Shalabh007 · ‎11-29-2022

@Dineshkumar Gopalakrishnan using python's exec() function can be used to execute a python statement, which in your case could be pyspark union statement. Refer below sample code snippet for your reference.

df1 = spark.sparkContext.parallelize([(1, 2, ["1", "2", "3"]), (1, 3, ["4", "1", "5", "6"]) , (2, 4, ["2"]),(2, 5, ["3"])]).toDF(["store", "count", "values"])
 
df2 = spark.sparkContext.parallelize([(3, 2, ["1", "2", "3"]), (3, 3, ["4", "1", "5", "6"]) , (4, 4, ["2"]),(4, 5, ["3"])]).toDF(["store", "count", "values"])
 
union_statment = "df = df1.union(df2)"
exec(union_statment)

Above code will execute the pyspark union api on df1 and df2 and will assign the result to dataframe 'df'.

You can have more complex union statment as part of your dynamic string

View solution in original post

Shalabh007 · ‎11-29-2022

@Dineshkumar Gopalakrishnan using python's exec() function can be used to execute a python statement, which in your case could be pyspark union statement. Refer below sample code snippet for your reference.

df1 = spark.sparkContext.parallelize([(1, 2, ["1", "2", "3"]), (1, 3, ["4", "1", "5", "6"]) , (2, 4, ["2"]),(2, 5, ["3"])]).toDF(["store", "count", "values"])
 
df2 = spark.sparkContext.parallelize([(3, 2, ["1", "2", "3"]), (3, 3, ["4", "1", "5", "6"]) , (4, 4, ["2"]),(4, 5, ["3"])]).toDF(["store", "count", "values"])
 
union_statment = "df = df1.union(df2)"
exec(union_statment)

Above code will execute the pyspark union api on df1 and df2 and will assign the result to dataframe 'df'.

You can have more complex union statment as part of your dynamic string

dineshg · ‎12-01-2022

Thank you @Shalabh Agarwal above solution worked for me.

Shalabh007 · ‎12-01-2022

amazing @Dineshkumar Gopalakrishnan

Can you please click on the "Select As Best" button if you find the information provided helps resolve your question.

Databricks Community

pyspark - execute dynamically framed action statement stored in string variable

Join Us as a Local Community Builder!

Join us for another BrickTalk: Vibe-Coding Databricks Apps in Replit with Augusto!

🌟 Community Pulse: Your Weekly Roundup! November 14 – 20, 2025

Celebrating Our First Brickster Champion: Louis Frolio

⭐ Setup Spark with Hadoop Anywhere : A DBR aligned local Spark+HDFS+Hive stack on Docker⭐

Big Book of Data Engineering - Get how-tos, code snippets and real-world examples