cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Confusion in string comparison

Sas
New Contributor II

Hello expert

I am new to spark. I am using same price of code but getting different results

When i am using below piece of code, i am getting error

py4j.Py4JException: Method or([class java.lang.String]) does not exist

df.filter(F.col("state").isNull()

     | F.col("state")==""

     | F.col("state").contains("")

     | F.col("number").isNull()).show()

However when i am using below piece of code, its working fine

df.withColumn("state",

       F.when(F.col("state")=="",None)

       .otherwise(F.col("state"))).show()

The same F.col("state")=="" code is working in one place but not working in other

1 ACCEPTED SOLUTION

Accepted Solutions

pvignesh92
Honored Contributor

@Saswata Dutta​ Welcome to the club. Wish you a great time with Spark.

The filter API always need the parameters in parenthesis for equality checks.

//Filter multiple condition
df.filter( (df.state  == "OH") & (df.gender  == "M") ) \
    .show(truncate=False)  

In your case, you missed the bracket in the condition. The below code should work.

df.filter((F.col("state").isNull())| (F.col("state")=="")| (F.col("state").contains(""))| (F.col("number").isNull())).show()

Please try and see if this helps.

View solution in original post

3 REPLIES 3

Ajay-Pandey
Esteemed Contributor III

Hi @Saswata Dutta​ ,

Please use blow code this will work-

df.filter((F.col("state").isNull())| (F.col("state")=="")| (F.col("state").contains(""))| (F.col("number").isNull())).show()

Ajay Kumar Pandey

pvignesh92
Honored Contributor

@Saswata Dutta​ Welcome to the club. Wish you a great time with Spark.

The filter API always need the parameters in parenthesis for equality checks.

//Filter multiple condition
df.filter( (df.state  == "OH") & (df.gender  == "M") ) \
    .show(truncate=False)  

In your case, you missed the bracket in the condition. The below code should work.

df.filter((F.col("state").isNull())| (F.col("state")=="")| (F.col("state").contains(""))| (F.col("number").isNull())).show()

Please try and see if this helps.

Anonymous
Not applicable

Hi @Saswata Dutta​ 

Thank you for your question! To assist you better, please take a moment to review the answer and let me know if it best fits your needs.

Please help us select the best solution by clicking on "Select As Best" if it does.

Your feedback will help us ensure that we are providing the best possible service to you.

Thank you!

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group