cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Apply expectations conditionally in SLDP

hdu
New Contributor III

Hi folks, 

The following code runs as expected, and all three rules are validated. 

    @dp.view(name=f"v_validate_source_{table}")
    @dp.expect_all_or_drop({"201-Data row":"row_cnt > 0"})
    @dp.expect_all_or_drop({
        "101-One footer row" : "footer_cnt = 1",
        "102-Row count mismatch" : "footer_row_cnt = row_cnt"
    })
    def validateSourceFileView():
          return df

My question is how can I apply rule 201 check conditionally? Something like:

    if condition:
       @dp.expect_all_or_drop({"201-Data row":"row_cnt > 0"})
    @dp.expect_all_or_drop({
        "101-One footer row" : "footer_cnt = 1",
        "102-Row count mismatch" : "footer_row_cnt = row_cnt"
    })

Thank you for your help in advance.

1 ACCEPTED SOLUTION

Accepted Solutions

mauriciofh
New Contributor III

Great question. With decorators, you cannot place them inside an if block the way you wrote. Decorators are applied when the function is defined.

The clean way is to build the expectations dictionary first, then apply one decorator:

rules = {
    "101-One footer row": "footer_cnt = 1",
    "102-Row count mismatch": "footer_row_cnt = row_cnt",
}

if condition:
    rules["201-Data row"] = "row_cnt > 0"

@dp.view(name=f"v_validate_source_{table}")
@dp.expect_all_or_drop(rules)
def validateSourceFileView():
    return df

Important detail: condition must be known at pipeline-definition time (for example, table name, config flag, pipeline parameter), not per-row runtime logic.

If you need row-level conditional behavior, encode it in the rule expression itself, for example:

"201-Data row": "NOT apply_201 OR row_cnt > 0"

where apply_201 is a boolean column/flag in the dataframe. This keeps one rule but makes it conditional per record

View solution in original post

2 REPLIES 2

mauriciofh
New Contributor III

Great question. With decorators, you cannot place them inside an if block the way you wrote. Decorators are applied when the function is defined.

The clean way is to build the expectations dictionary first, then apply one decorator:

rules = {
    "101-One footer row": "footer_cnt = 1",
    "102-Row count mismatch": "footer_row_cnt = row_cnt",
}

if condition:
    rules["201-Data row"] = "row_cnt > 0"

@dp.view(name=f"v_validate_source_{table}")
@dp.expect_all_or_drop(rules)
def validateSourceFileView():
    return df

Important detail: condition must be known at pipeline-definition time (for example, table name, config flag, pipeline parameter), not per-row runtime logic.

If you need row-level conditional behavior, encode it in the rule expression itself, for example:

"201-Data row": "NOT apply_201 OR row_cnt > 0"

where apply_201 is a boolean column/flag in the dataframe. This keeps one rule but makes it conditional per record

hdu
New Contributor III

Thank you @mauriciofh . You are right. I should add/remove the validation rules from the rule dictionary.