01-16-2023 05:29 PM
in my dataframe it have one column name like count, if that particular column value is greater than zero, the job needs to get failed, how can i perform that one?
01-17-2023 04:21 AM
01-16-2023 07:26 PM
Hi @Mohammed sadamusean
Can you try like below code in pyspark and let me know if you face any issues
variable_name = df.select(col("Column_Name")).collect()[0][0]
if(variable_name>0):
dbutils.notebook.exit('Notebook Failed')
Happy Learning!!
01-17-2023 01:44 AM
Code without collect, which should not be used in production:
if df.filter("count > 0").count() > 0: dbutils.notebook.exit('Notebook Failed')
you can also use a more aggressive version:
if df.filter("count > 0").count() > 0: raise Exception("count bigger than 0")
01-17-2023 03:14 AM
but it will get total count of the column right, but i need to check every specific column value
01-17-2023 04:02 AM
first you filter for rows matching your query. You said that column is named count. Let's assume that column is called col instead, so filter("col > 0"), and then you apply the count() function, which will return how many rows match those criteria.
01-17-2023 04:17 AM
it is working but how can we check the columns based on two values like count >0 and less than 0 , i tried with equal to 0 but it doesn't worked
01-17-2023 04:21 AM
just put like in SQL
"colname > 0 OR colname< 0"
or
"colname != 0"
Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.
If there isn’t a group near you, start one and help create a community that brings people together.
Request a New Group