cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
cancel
Showing results for 
Search instead for 
Did you mean: 

Generating Spark SQL query using Python

Constantine
Contributor III

I have a Spark SQL notebook on DB where I have a sql query like

SELECT * 
FROM table_name
WHERE 
     condition_1 = 'fname'  OR condition_1 = 'lname' OR condition_1 = 'mname' 
    AND condition_2 = 'apple' 
   AND condition_3 ='orange'

There are a lot of conditions, is there a way to define a python function and link it to sql to automatically generate the WHERE clause string

1 ACCEPTED SOLUTION

Accepted Solutions

Hubert-Dudek
Esteemed Contributor III
fruits= ["apple", "orange"]
df.where(col("v").isin(fruits))

or

fruits= ["apple", "orange"]
sqlContext.sql("SELECT * FROM df WHERE v IN {0}".format(fruits))

View solution in original post

2 REPLIES 2

Hubert-Dudek
Esteemed Contributor III
fruits= ["apple", "orange"]
df.where(col("v").isin(fruits))

or

fruits= ["apple", "orange"]
sqlContext.sql("SELECT * FROM df WHERE v IN {0}".format(fruits))

jose_gonzalez
Moderator
Moderator

Hi @John Constantine​ ,

I think you can also use arrays_overlap() for your OR statements docs here

Welcome to Databricks Community: Lets learn, network and celebrate together

Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections. 

Click here to register and join today! 

Engage in exciting technical discussions, join a group with your peers and meet our Featured Members.