Databricks Community

StephanieAlba · ‎07-06-2021

In the sense that, is it possible to only check for column names or column data types or will it always be both?

StephanieAlba · ‎07-06-2021

No, I do not believe that is possible. However, I would be interested in understanding a use case where that is ideal behavior.

How Does Schema Enforcement Work?

Delta Lake uses schema validation on write, which means that all new writes to a table are checked for compatibility with the target table’s schema at write time. If the schema is not compatible, Delta Lake cancels the transaction altogether (no data is written), and raises an exception to let the user know about the mismatch.

To determine whether a write to a table is compatible, Delta Lake uses the following rules. The DataFrame to be written:

Cannot contain any additional columns that are not present in the target table’s schema. Conversely, it’s OK if the incoming data doesn’t contain every column in the table – those columns will simply be assigned null values.
Cannot have column data types that differ from the column data types in the target table. If a target table’s column contains StringType data, but the corresponding column in the DataFrame contains IntegerType data, schema enforcement will raise an exception and prevent the write operation from taking place.
Can not contain column names that differ only by case. This means that you cannot have columns such as ‘Foo’ and ‘foo’ defined in the same table. While Spark can be used in case sensitive or insensitive (default) mode, Delta Lake is case-preserving but insensitive when storing the schema. Parquet is case sensitive when storing and returning column information. To avoid potential mistakes, data corruption or loss issues (which we’ve personally experienced at Databricks), we decided to add this restriction.

Databricks Community

Is the delta schema enforcement flexible?

Connect with Databricks Users in Your Area

Announcing the Winners of the Generative AI World Cup

Databricks Learning Festival (Virtual): 15 January - 31 January 2025

How to present and share your Notebook insights in AI/BI Dashboards

Introducing an exclusively Databricks-hosted Assistant

Meet the Databricks MVPs