cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Databricks pipeline fails expectation on execute python script, throws error: Update FAILES

RobFer1985
New Contributor

Hi Community,

I'm new to Databricks and am trying to make and implement pipeline expectations, The pipelines work without errors and my job works. I've tried multiple ways to implement expectations, sql and python. I keep resolving the errors but end up with the same error. I'm working with the free trial version of Databricks. Is there a limitation to building expectations on the trial version? Are there table permissions in databricks I'm not taking into account? The order_2 table is a streaming table, are there limitations to applying expectations to streaming tables?   My python code:

 

%python
from pyspark import pipelines as dp
from pyspark.sql.functions import col

@dp.table(
    name="xyntrel_bronze.bronze.orders_2",
    comment="Orders table with data quality constraints"
)
@dp.expect_or_fail("row count > 100", "COUNT(*) > 100")
@dp.expect_or_fail("customer_id not null", "customer_id IS NOT NULL") 
def bronze_table():
    return (
        spark.readStream.table("xyntrel_bronze.bronze.orders_2")
        .filter(col("order_id").isNotNull())
    )

 The complete error in json:

"timestamp": "2025-12-10T18:23:25.679Z",
    "message": "Update 19907c is FAILED.",
    "level": "ERROR",
    "error": {
        "exceptions": [
            {
                "message": "",
                "error_class": "_UNCLASSIFIED_PYTHON_COMMAND_ERROR",
                "short_message": ""
            }
        ],
        "fatal": true
    },
    "details": {
        "update_progress": {
            "state": "FAILED"
        }
    },
    "event_type": "update_progress",
    "maturity_level": "STABLE"
}

 Thanks guys!

2 REPLIES 2

emma_s
Databricks Employee
Databricks Employee

Hey,  I think it may be the row_count condition causing the issue. The expectation runs on each row and sees if the record meets the criteria in the expectation, so you're effectively asking count * on each record, which will always evaluate to 1 and therefore always fail your condition. I hope this helps. 

carlo968rojer
Visitor

Hello, @RobFer1985 

The primary cause of your error is a circular reference in your logic: you are defining a table named orders_2 while simultaneously trying to readStream from that same table. In Delta Live Tables (DLT), the function acts as the "writer," so it cannot read from itself during the same process without causing the initialization to crash. Additionally, your code uses the wrong library; you should use import dlt rather than pyspark.pipelines to ensure the DLT engine recognizes the decorators. You also need to adjust your expectations because DLT processes constraints on a row-by-row basis. A constraint like COUNT(*) > 100 will fail because the engine validates individual records as they stream through, rather than performing a global aggregation of the entire table.

To resolve this, ensure your readStream points to a separate source table or file path (the "raw" data) and remove the aggregate count expectation. Your final code should look like a standard DLT definition where the input is a source table and the output is your new orders_2 table. Once you point the stream to the actual source data and use the standard dlt library, the "Unclassified Python Command Error" should disappear.