Usage of if else condition for data check - Databricks Community - 65756

Register to join the community

Community Discussions

Connect with fellow community members to discuss general topics related to the Databricks platform, industry trends, and best practices. Share experiences, ask questions, and foster collaboration within the community.

Hi,

In a particular Workflows Job, I am trying to add some data checks in between each task by using If else statement.

I used following statement in a notebook to call parameter in if else condition to check logic.

{
"job_id": XXXXX,
"notebook_params": {
"minMPID": "200912",
"maxMPID": "202312",
"Flag":1
}
}

After that, I am adding a if/else condition task in between 2 tasks to check data and calling maxMPID to check whether maxMPID == 202312. However, it gets failure.

How to define a parameter to use in if else condition Job? Or do you have any suggestion to add sanity checks for data (whether any duplication, or table filled) in between each task?

1 REPLY 1

Hi @Azsdc, In Databricks Jobs, you can use conditional logic to control task execution.

Let’s break down how you can achieve this:

Using Parameters in If/Else Conditions:
- To define a parameter for use in an If/Else condition within a job, follow these steps:
  1. Edit the Task: When editing a task with one or more dependencies, you can add a Run if condition. This condition is evaluated after completing all task dependencies.
  2. Select Condition: Choose the condition from the “Run if dependencies” drop-down menu in the task configuration.
  3. Evaluate Logic: The Run if condition allows you to specify when the task should run based on the outcome of its dependencies.
  4. Operand Options: You can reference job or task state using job and task parameter variables or use t...¹.
- For your specific case, you can use the maxMPID parameter in your If/Else condition to check whether maxMPID == 202312.
Adding Sanity Checks for Data:
- To add sanity checks between tasks, consider the following approaches:
  - Data Duplication Check: If you want to prevent data duplication, you can compare the data processed by the current task with existing data (e.g., in a database table). If duplicates are detected, handle them accordingly (e.g., log an error or skip processing).
  - Table Filled Check: To ensure a table is filled, you can query the table or check its row count. If the table is empty, you can take appropriate actions (e.g., raise an alert or skip downstream tasks).
- Implement these checks as separate tasks in your job, and use conditional logic to control their execution based on the outcome of preceding tasks.

Remember that Databricks Jobs allows you to run tasks conditionally based on various criteria, including the success or failure of dependencies. By combining parameters and conditional logic, you can create robust workflows that handle different...¹.

Feel free to adjust the specifics based on your job requirements, and let me know if you need further assistance! 🚀

never-displayed

You must be signed in to add attachments

never-displayed

Announcements

Data + AI World Tour 2024

Databricks Community Social - July 31 - 8AM PT

Get Started With Generative AI on Databricks

Submit your feedback and win a $25 gift card!

🔔 ALERT: Act Now to Protect Your Community Account; Secure Your Details Before It's Too Late!