05-14-2024 07:27 AM
In the example in https://docs.databricks.com/en/delta-live-tables/expectations.html#fail-on-invalid-records, it wrote that one is able to query the DLT event log for such expectations violation.
In Databricks, I can use expectation to fail or drop records, but how do I capture the reasons (expectations violated) for each of the record dropped/failed?
Expectation Violated:
{
"flowName": "a-b",
"verboseInfo": {
"expectationsViolated": [
"x1 is negative"
],
"inputData": {
"a": {"x1": 1,"y1": "a },
"b": {
"x2": 1,
"y2": "aa"
}
},
"outputRecord": {
"x1": 1,
"y1": "a",
"x2": 1,
"y2": "aa"
},
"missingInputData": false
}
}
05-16-2024 07:25 PM
Hi @youcanlearn,
This information would be written to `log4j.txt` as part of a stack trace when the expectation is created with one of the `fail` expectation operators (e.g. `expect_or_fail`). When a failure occurs, you would see a `Caused by` log message such as:
Caused by: java.lang.RuntimeException: Expectation violated: {"flowName":"dlt_autoloader_csv_test","verboseInfo":{"expectationsViolated":["valid_max_length"],"inputData":{},"outputRecord":{"col1":"12345678901234567890123456789","col2":"two","_rescued_data":null},"missingInputData":false}}
...which contains a JSON payload such as the one referenced in the docs you linked to.
Additionally, you could find the stack trace with the same messaging in the Event Log within the `error.exceptions` array.
Hope this helps.
05-22-2024 06:57 AM
That's right, the "reason" would be "x1 is negative" in your example and "valid_max_length" in the example JSON payload that I shared.
If you are looking for a descriptive reason, you would name the expectation accordingly such as:
@Dlt.expect_or_fail("this expectation will fail because of reason1 and reason2", "count > 0")
05-16-2024 07:25 PM
Hi @youcanlearn,
This information would be written to `log4j.txt` as part of a stack trace when the expectation is created with one of the `fail` expectation operators (e.g. `expect_or_fail`). When a failure occurs, you would see a `Caused by` log message such as:
Caused by: java.lang.RuntimeException: Expectation violated: {"flowName":"dlt_autoloader_csv_test","verboseInfo":{"expectationsViolated":["valid_max_length"],"inputData":{},"outputRecord":{"col1":"12345678901234567890123456789","col2":"two","_rescued_data":null},"missingInputData":false}}
...which contains a JSON payload such as the one referenced in the docs you linked to.
Additionally, you could find the stack trace with the same messaging in the Event Log within the `error.exceptions` array.
Hope this helps.
05-22-2024 04:16 AM
Hi @brockb,
I wanted to allow each failed record to have a "reason" for being rejected/failed. Is this the best way for me to capture the "reason"?
05-22-2024 06:57 AM
That's right, the "reason" would be "x1 is negative" in your example and "valid_max_length" in the example JSON payload that I shared.
If you are looking for a descriptive reason, you would name the expectation accordingly such as:
@Dlt.expect_or_fail("this expectation will fail because of reason1 and reason2", "count > 0")
Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.
If there isn’t a group near you, start one and help create a community that brings people together.
Request a New Group