cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

ON VIOLATION DROP ROW Question!

chorongs
New Contributor III

What exactly does "ON VIOLATION DROP ROW" do?

1 ACCEPTED SOLUTION

Accepted Solutions

pvignesh92
Honored Contributor

Hi @chorongs , There is a feature available in Delta Live tables where you can check the quality of your dataset at the time of your load to target table and take action like dropping those invalid records that did not meet your conditions, making your job fail, etc. 

Ex. 

CONSTRAINT valid_current_page EXPECT (current_page_id IS NOT NULL and current_page_title IS NOT NULL) ON VIOLATION DROP ROW

 In the above example, if the record does not meet the set expectations, ON VIOLATION DROP will not load it to your target table. This information about the invalid record will also get captured in the metrics collected as a part of DLT pipeline. 

You can check more about this in the below link 

https://docs.databricks.com/delta-live-tables/expectations.html#language-sql

View solution in original post

3 REPLIES 3

yogu
Honored Contributor III

it mean drop any rows that violate the specified constraints instead of throwing an error and aborting the write operation. This can be useful when you want to handle constraint violations by simply excluding the problematic rows from the write process, allowing the rest of the data to be written successfully.

Here's an example of how you can use it:

spark.conf.set("spark.databricks.delta.constraints", "column_1 > 0")
spark.conf.set("spark.databricks.delta.constraints.mode", "ON VIOLATION DROP ROW")

# Write DataFrame to a Databricks Delta table
df.write.format("delta").mode("append").save("path/to/table")

In this example, the constraint is specified as "column_1 > 0". If any rows violate this constraint, they will be dropped during the write operation, and the rest of the rows satisfying the constraint will be successfully written to the Databricks Delta table.

pvignesh92
Honored Contributor

I think this is not supported in Delta Table yet. The Expectations feature to define the data quality is available only in Delta Live tables. Let me know if you see any examples from the documentation for this in Delta table.  

pvignesh92
Honored Contributor

Hi @chorongs , There is a feature available in Delta Live tables where you can check the quality of your dataset at the time of your load to target table and take action like dropping those invalid records that did not meet your conditions, making your job fail, etc. 

Ex. 

CONSTRAINT valid_current_page EXPECT (current_page_id IS NOT NULL and current_page_title IS NOT NULL) ON VIOLATION DROP ROW

 In the above example, if the record does not meet the set expectations, ON VIOLATION DROP will not load it to your target table. This information about the invalid record will also get captured in the metrics collected as a part of DLT pipeline. 

You can check more about this in the below link 

https://docs.databricks.com/delta-live-tables/expectations.html#language-sql

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you wonโ€™t want to miss the chance to attend and share knowledge.

If there isnโ€™t a group near you, start one and help create a community that brings people together.

Request a New Group