cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

Delta Table Upsert fails when source attributes are missing

MustangR
New Contributor

Hi All,

I am trying to merge a json to delta table. Since the Json is basically from MongoDB which does not have a schema, there are chances of having missing attributes expected by delta table schema validation. Schema Evolution is enabled as well. How do i handle this ? Is there a way to add these missing attributes and default values or disable schema validation for such cases. I am using Delta table framework for scala. 

Thanks

2 REPLIES 2

Palash01
Valued Contributor

Hey @MustangR 

Thanks for bringing up your concern, You're right, merging JSON data from MongoDB with a Delta table can pose challenges due to schema differences. Here are two ways which I could think of to handle this situation:

1. Leveraging Schema Evolution:

  • Automatic Schema Evolution: When schema evolution is enabled, Delta table allows adding new columns during writes. However, it treats missing required columns differently:
    • Schema on Read vs. Write: If the missing column is present in the Delta table schema used for reading but absent in the write schema, the merge will fail.
    • New Required Column: If the missing column is a newly introduced requirement in the Delta table schema, the merge will fail.

How to solve this problem: 

  • Existing Optional Columns: Missing optional columns will have NULL values during merge, no further action needed.
  • New Optional Columns: If you anticipate encountering new optional columns in the JSON data, add them to your Delta table schema with NULL default values. This allows automatic handling by schema evolution.
  • Required Columns Present but not in Write Schema: Modify your write process to include the required column from the Delta table schema.
  • Required Columns Newly Introduced: If the requirement is recent, consider adding the column to the Delta table schema with a suitable default value to allow future merges. You can then backfill existing data if needed.

2. Preprocessing JSON Data:

  • Identify Missing Attributes: Before merging, analyze your MongoDB JSON data to identify potentially missing attributes compared to the Delta table schema.
  • Add Missing Attributes: Add missing attributes to the JSON data with default values before writing them to Delta. You can use libraries like spark-sql-catalyst to perform schema-aware transformations.
  • Handle Schema Violations: If you cannot add missing attributes, you can create a separate "error" table to store rows that violate the Delta table schema.

 

Leave a like if this helps! Kudos,
Palash

JohnM256
New Contributor II

How do I set Existing Optional Columns?

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you wonโ€™t want to miss the chance to attend and share knowledge.

If there isnโ€™t a group near you, start one and help create a community that brings people together.

Request a New Group