cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

Is the delta schema enforcement flexible?

StephanieAlba
Databricks Employee
Databricks Employee

 In the sense that, is it possible to only check for column names or column data types or will it always be both?

1 REPLY 1

StephanieAlba
Databricks Employee
Databricks Employee

No, I do not believe that is possible. However, I would be interested in understanding a use case where that is ideal behavior.

How Does Schema Enforcement Work?

Delta Lake uses schema validation on write, which means that all new writes to a table are checked for compatibility with the target tableโ€™s schema at write time. If the schema is not compatible, Delta Lake cancels the transaction altogether (no data is written), and raises an exception to let the user know about the mismatch.

To determine whether a write to a table is compatible, Delta Lake uses the following rules. The DataFrame to be written:

  • Cannot contain any additional columns that are not present in the target tableโ€™s schema. Conversely, itโ€™s OK if the incoming data doesnโ€™t contain every column in the table โ€“ those columns will simply be assigned null values.
  • Cannot have column data types that differ from the column data types in the target table. If a target tableโ€™s column contains StringType data, but the corresponding column in the DataFrame contains IntegerType data, schema enforcement will raise an exception and prevent the write operation from taking place.
  • Can not contain column names that differ only by case. This means that you cannot have columns such as โ€˜Fooโ€™ and โ€˜fooโ€™ defined in the same table. While Spark can be used in case sensitive or insensitive (default) mode, Delta Lake is case-preserving but insensitive when storing the schema. Parquet is case sensitive when storing and returning column information. To avoid potential mistakes, data corruption or loss issues (which weโ€™ve personally experienced at Databricks), we decided to add this restriction.

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you wonโ€™t want to miss the chance to attend and share knowledge.

If there isnโ€™t a group near you, start one and help create a community that brings people together.

Request a New Group