Generating Spark SQL query using Python

Constantine
Contributor III

I have a Spark SQL notebook on DB where I have a sql query like

SELECT * 
FROM table_name
WHERE 
     condition_1 = 'fname'  OR condition_1 = 'lname' OR condition_1 = 'mname' 
    AND condition_2 = 'apple' 
   AND condition_3 ='orange'

There are a lot of conditions, is there a way to define a python function and link it to sql to automatically generate the WHERE clause string

Hubert-Dudek
Databricks MVP
fruits= ["apple", "orange"]
df.where(col("v").isin(fruits))

or

fruits= ["apple", "orange"]
sqlContext.sql("SELECT * FROM df WHERE v IN {0}".format(fruits))


My blog: https://databrickster.medium.com/

View solution in original post

jose_gonzalez
Databricks Employee
Databricks Employee

Hi @John Constantine​ ,

I think you can also use arrays_overlap() for your OR statements docs here