โ05-14-2024 07:27 AM
In the example in https://docs.databricks.com/en/delta-live-tables/expectations.html#fail-on-invalid-records, it wrote that one is able to query the DLT event log for such expectations violation.
In Databricks, I can use expectation to fail or drop records, but how do I capture the reasons (expectations violated) for each of the record dropped/failed?
Expectation Violated:
{
"flowName": "a-b",
"verboseInfo": {
"expectationsViolated": [
"x1 is negative"
],
"inputData": {
"a": {"x1": 1,"y1": "a },
"b": {
"x2": 1,
"y2": "aa"
}
},
"outputRecord": {
"x1": 1,
"y1": "a",
"x2": 1,
"y2": "aa"
},
"missingInputData": false
}
}
โ05-16-2024 07:25 PM
Hi @youcanlearn,
This information would be written to `log4j.txt` as part of a stack trace when the expectation is created with one of the `fail` expectation operators (e.g. `expect_or_fail`). When a failure occurs, you would see a `Caused by` log message such as:
Caused by: java.lang.RuntimeException: Expectation violated: {"flowName":"dlt_autoloader_csv_test","verboseInfo":{"expectationsViolated":["valid_max_length"],"inputData":{},"outputRecord":{"col1":"12345678901234567890123456789","col2":"two","_rescued_data":null},"missingInputData":false}}
...which contains a JSON payload such as the one referenced in the docs you linked to.
Additionally, you could find the stack trace with the same messaging in the Event Log within the `error.exceptions` array.
Hope this helps.
โ05-22-2024 06:57 AM
That's right, the "reason" would be "x1 is negative" in your example and "valid_max_length" in the example JSON payload that I shared.
If you are looking for a descriptive reason, you would name the expectation accordingly such as:
@Dlt.expect_or_fail("this expectation will fail because of reason1 and reason2", "count > 0")
โ05-16-2024 07:25 PM
Hi @youcanlearn,
This information would be written to `log4j.txt` as part of a stack trace when the expectation is created with one of the `fail` expectation operators (e.g. `expect_or_fail`). When a failure occurs, you would see a `Caused by` log message such as:
Caused by: java.lang.RuntimeException: Expectation violated: {"flowName":"dlt_autoloader_csv_test","verboseInfo":{"expectationsViolated":["valid_max_length"],"inputData":{},"outputRecord":{"col1":"12345678901234567890123456789","col2":"two","_rescued_data":null},"missingInputData":false}}
...which contains a JSON payload such as the one referenced in the docs you linked to.
Additionally, you could find the stack trace with the same messaging in the Event Log within the `error.exceptions` array.
Hope this helps.
โ05-22-2024 04:16 AM
Hi @brockb,
I wanted to allow each failed record to have a "reason" for being rejected/failed. Is this the best way for me to capture the "reason"?
โ05-22-2024 06:57 AM
That's right, the "reason" would be "x1 is negative" in your example and "valid_max_length" in the example JSON payload that I shared.
If you are looking for a descriptive reason, you would name the expectation accordingly such as:
@Dlt.expect_or_fail("this expectation will fail because of reason1 and reason2", "count > 0")
Passionate about hosting events and connecting people? Help us grow a vibrant local communityโsign up today to get started!
Sign Up Now