Databricks Community

youcanlearn · ‎05-14-2024

In the example in https://docs.databricks.com/en/delta-live-tables/expectations.html#fail-on-invalid-records, it wrote that one is able to query the DLT event log for such expectations violation.

In Databricks, I can use expectation to fail or drop records, but how do I capture the reasons (expectations violated) for each of the record dropped/failed?

Expectation Violated:
{
  "flowName": "a-b",
  "verboseInfo": {
    "expectationsViolated": [
      "x1 is negative"
    ],
    "inputData": {
      "a": {"x1": 1,"y1": "a },
      "b": {
        "x2": 1,
        "y2": "aa"
      }
    },
    "outputRecord": {
      "x1": 1,
      "y1": "a",
      "x2": 1,
      "y2": "aa"
    },
    "missingInputData": false
  }
}

brockb · ‎05-16-2024

Hi @youcanlearn,

This information would be written to `log4j.txt` as part of a stack trace when the expectation is created with one of the `fail` expectation operators (e.g. `expect_or_fail`). When a failure occurs, you would see a `Caused by` log message such as:

Caused by: java.lang.RuntimeException: Expectation violated: {"flowName":"dlt_autoloader_csv_test","verboseInfo":{"expectationsViolated":["valid_max_length"],"inputData":{},"outputRecord":{"col1":"12345678901234567890123456789","col2":"two","_rescued_data":null},"missingInputData":false}}

...which contains a JSON payload such as the one referenced in the docs you linked to.

Additionally, you could find the stack trace with the same messaging in the Event Log within the `error.exceptions` array.

Hope this helps.

View solution in original post

brockb · ‎05-22-2024

That's right, the "reason" would be "x1 is negative" in your example and "valid_max_length" in the example JSON payload that I shared.

If you are looking for a descriptive reason, you would name the expectation accordingly such as:

@Dlt.expect_or_fail("this expectation will fail because of reason1 and reason2", "count > 0")

View solution in original post

brockb · ‎05-16-2024

Hi @youcanlearn,

This information would be written to `log4j.txt` as part of a stack trace when the expectation is created with one of the `fail` expectation operators (e.g. `expect_or_fail`). When a failure occurs, you would see a `Caused by` log message such as:

Caused by: java.lang.RuntimeException: Expectation violated: {"flowName":"dlt_autoloader_csv_test","verboseInfo":{"expectationsViolated":["valid_max_length"],"inputData":{},"outputRecord":{"col1":"12345678901234567890123456789","col2":"two","_rescued_data":null},"missingInputData":false}}

...which contains a JSON payload such as the one referenced in the docs you linked to.

Additionally, you could find the stack trace with the same messaging in the Event Log within the `error.exceptions` array.

Hope this helps.

youcanlearn · ‎05-22-2024

Hi @brockb,

I wanted to allow each failed record to have a "reason" for being rejected/failed. Is this the best way for me to capture the "reason"?

brockb · ‎05-22-2024

That's right, the "reason" would be "x1 is negative" in your example and "valid_max_length" in the example JSON payload that I shared.

If you are looking for a descriptive reason, you would name the expectation accordingly such as:

@Dlt.expect_or_fail("this expectation will fail because of reason1 and reason2", "count > 0")

Databricks Community

Databricks Expectations

Connect with Databricks Users in Your Area

Now Hiring: Databricks Community Technical Moderator

Data + AI Summit: Call for Presentations

Season's Speedings: Databricks SQL Delivers 4x Performance Boost Over Two Years

Databricks Community Champion - October 2024 - Filip Niziol

Become Our Next Monthly Community Champion!