cancel
Showing results for 
Search instead for 
Did you mean: 
Community Discussions
Connect with fellow community members to discuss general topics related to the Databricks platform, industry trends, and best practices. Share experiences, ask questions, and foster collaboration within the community.
cancel
Showing results for 
Search instead for 
Did you mean: 

Usage of if else condition for data check

Azsdc
New Contributor

Hi,

In a particular Workflows Job, I am trying to add some data checks in between each task by using If else statement. 

I used following statement in a notebook to call parameter in if else condition to check logic.

{
"job_id": XXXXX,
"notebook_params": {
"minMPID": "200912",
"maxMPID": "202312",
"Flag":1
}
}

After that, I am adding a if/else condition task in between 2 tasks to check data and calling maxMPID to check whether maxMPID == 202312. However, it gets failure.

How to define a parameter to use in if else condition Job? Or do you have any suggestion to add sanity checks for data (whether any duplication, or table filled) in between each task?

1 REPLY 1

Kaniz_Fatma
Community Manager
Community Manager

Hi @AzsdcIn Databricks Jobs, you can use conditional logic to control task execution.

Let’s break down how you can achieve this:

  1. Using Parameters in If/Else Conditions:

    • To define a parameter for use in an If/Else condition within a job, follow these steps:
      1. Edit the Task: When editing a task with one or more dependencies, you can add a Run if condition. This condition is evaluated after completing all task dependencies.
      2. Select Condition: Choose the condition from the “Run if dependencies” drop-down menu in the task configuration.
      3. Evaluate Logic: The Run if condition allows you to specify when the task should run based on the outcome of its dependencies.
      4. Operand Options: You can reference job or task state using job and task parameter variables or use t...1.
    • For your specific case, you can use the maxMPID parameter in your If/Else condition to check whether maxMPID == 202312.
  2. Adding Sanity Checks for Data:

    • To add sanity checks between tasks, consider the following approaches:
      • Data Duplication Check: If you want to prevent data duplication, you can compare the data processed by the current task with existing data (e.g., in a database table). If duplicates are detected, handle them accordingly (e.g., log an error or skip processing).
      • Table Filled Check: To ensure a table is filled, you can query the table or check its row count. If the table is empty, you can take appropriate actions (e.g., raise an alert or skip downstream tasks).
    • Implement these checks as separate tasks in your job, and use conditional logic to control their execution based on the outcome of preceding tasks.

Remember that Databricks Jobs allows you to run tasks conditionally based on various criteria, including the success or failure of dependencies. By combining parameters and conditional logic, you can create robust workflows that handle different...1.

Feel free to adjust the specifics based on your job requirements, and let me know if you need further assistance! 🚀

 
Join 100K+ Data Experts: Register Now & Grow with Us!

Excited to expand your horizons with us? Click here to Register and begin your journey to success!

Already a member? Login and join your local regional user group! If there isn’t one near you, fill out this form and we’ll create one for you to join!