cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

pyspark - execute dynamically framed action statement stored in string variable

dineshg
New Contributor III

I need to execute union statement which is framed dynamically and stored in string variable. I framed the union statement, but struck with executing the statement. Does anyone know how to execute union statement stored in string variable? I'm using pyspark in databricks notebook.

df1 = df.filter((col("vchDataSection") == "AccountMasterInfo") & (col("bActive") == 1)).withColumn("dfs", concat(lit(".union(df"), col("iRuleid"), lit(")")))

df2 = df1.agg(concat_ws("",collect_list(col("dfs")))).withColumnRenamed("concat_ws(, collect_list(dfs))", "AccInfoRules").withColumn("replacestr",lit(""))

df3 = df2.select(overlay("AccInfoRules","replacestr",1,7).alias("overlayed"))

var_a = df3.collect()

var_a = var_a[0].__getitem__('overlayed')

var_b = var_a.replace(')', '', 1)

print(var_b)

o/p: df533.union(df534).union(df535).union(df536)

1 ACCEPTED SOLUTION

Accepted Solutions

Shalabh007
Honored Contributor

@Dineshkumar Gopalakrishnanโ€‹ using python's exec() function can be used to execute a python statement, which in your case could be pyspark union statement. Refer below sample code snippet for your reference.

df1 = spark.sparkContext.parallelize([(1, 2, ["1", "2", "3"]), (1, 3, ["4", "1", "5", "6"]) , (2, 4, ["2"]),(2, 5, ["3"])]).toDF(["store", "count", "values"])
 
df2 = spark.sparkContext.parallelize([(3, 2, ["1", "2", "3"]), (3, 3, ["4", "1", "5", "6"]) , (4, 4, ["2"]),(4, 5, ["3"])]).toDF(["store", "count", "values"])
 
union_statment = "df = df1.union(df2)"
exec(union_statment)

Above code will execute the pyspark union api on df1 and df2 and will assign the result to dataframe 'df'.

You can have more complex union statment as part of your dynamic string

View solution in original post

3 REPLIES 3

Shalabh007
Honored Contributor

@Dineshkumar Gopalakrishnanโ€‹ using python's exec() function can be used to execute a python statement, which in your case could be pyspark union statement. Refer below sample code snippet for your reference.

df1 = spark.sparkContext.parallelize([(1, 2, ["1", "2", "3"]), (1, 3, ["4", "1", "5", "6"]) , (2, 4, ["2"]),(2, 5, ["3"])]).toDF(["store", "count", "values"])
 
df2 = spark.sparkContext.parallelize([(3, 2, ["1", "2", "3"]), (3, 3, ["4", "1", "5", "6"]) , (4, 4, ["2"]),(4, 5, ["3"])]).toDF(["store", "count", "values"])
 
union_statment = "df = df1.union(df2)"
exec(union_statment)

Above code will execute the pyspark union api on df1 and df2 and will assign the result to dataframe 'df'.

You can have more complex union statment as part of your dynamic string

dineshg
New Contributor III

Thank you @Shalabh Agarwalโ€‹ above solution worked for me.

Shalabh007
Honored Contributor

amazing @Dineshkumar Gopalakrishnanโ€‹

Can you please click on the "Select As Best" button if you find the information provided helps resolve your question.

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you wonโ€™t want to miss the chance to attend and share knowledge.

If there isnโ€™t a group near you, start one and help create a community that brings people together.

Request a New Group