cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
cancel
Showing results for 
Search instead for 
Did you mean: 

Delta Table Upsert fails when source attributes are missing

MustangR
New Contributor

Hi All,

I am trying to merge a json to delta table. Since the Json is basically from MongoDB which does not have a schema, there are chances of having missing attributes expected by delta table schema validation. Schema Evolution is enabled as well. How do i handle this ? Is there a way to add these missing attributes and default values or disable schema validation for such cases. I am using Delta table framework for scala. 

Thanks

1 REPLY 1

Palash01
Contributor III

Hey @MustangR 

Thanks for bringing up your concern, You're right, merging JSON data from MongoDB with a Delta table can pose challenges due to schema differences. Here are two ways which I could think of to handle this situation:

1. Leveraging Schema Evolution:

  • Automatic Schema Evolution: When schema evolution is enabled, Delta table allows adding new columns during writes. However, it treats missing required columns differently:
    • Schema on Read vs. Write: If the missing column is present in the Delta table schema used for reading but absent in the write schema, the merge will fail.
    • New Required Column: If the missing column is a newly introduced requirement in the Delta table schema, the merge will fail.

How to solve this problem: 

  • Existing Optional Columns: Missing optional columns will have NULL values during merge, no further action needed.
  • New Optional Columns: If you anticipate encountering new optional columns in the JSON data, add them to your Delta table schema with NULL default values. This allows automatic handling by schema evolution.
  • Required Columns Present but not in Write Schema: Modify your write process to include the required column from the Delta table schema.
  • Required Columns Newly Introduced: If the requirement is recent, consider adding the column to the Delta table schema with a suitable default value to allow future merges. You can then backfill existing data if needed.

2. Preprocessing JSON Data:

  • Identify Missing Attributes: Before merging, analyze your MongoDB JSON data to identify potentially missing attributes compared to the Delta table schema.
  • Add Missing Attributes: Add missing attributes to the JSON data with default values before writing them to Delta. You can use libraries like spark-sql-catalyst to perform schema-aware transformations.
  • Handle Schema Violations: If you cannot add missing attributes, you can create a separate "error" table to store rows that violate the Delta table schema.

 

Leave a like if this helps! Kudos,
Palash
Welcome to Databricks Community: Lets learn, network and celebrate together

Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections. 

Click here to register and join today! 

Engage in exciting technical discussions, join a group with your peers and meet our Featured Members.