Databricks Community

dpc · ‎01-15-2026

For all it's positives, one of the first general issues we had with databricks was case sensitivity.

We have a lot of data specific filters in our code

Problem is, we land and view data from lots of different case insensitive source systems e.g. SQL Server

As such, we have to be very careful with our code and convert columns to UPPER when making a comparison.

Most of our code is written in SQL.

About 18 months ago I asked whether there was going to be a catalog, schema or table setting for this i.e. make the object case insensitive.

I was told it was on its way.

Not heard anything since and cannot find anything.

Does anybody know whether this is in place or expected?

Thanks

emma_s · ‎01-16-2026

Hi, You can set the default collation at Catalog level or schema level and the tables in the catalog will inherit the collation. This is supported from DBR 17.1 and above.

View solution in original post

szymon_dybczak · ‎01-16-2026

Hi @dpc ,

Like @emma_s mentioned - you can set it at catalog/schema. Table will inherit collation. But you can also define it explicitly for a specific column within a table using collate modifier

View solution in original post

szymon_dybczak · ‎01-15-2026

Hi @dpc ,

I think you can try to use a collation for that purpose. A collation is a set of rules that determines how string comparisons are performed. Collations are used to compare strings in a case-insensitive, accent-insensitive, or trailing space insensitive manner, or to sort strings in a specific language-aware order.

Collation | Databricks on AWS

dpc · ‎01-16-2026

Thanks.

Collation is table specific though isn't it? and you have to apply it to each columns.

Is there a was to just say, this schema, catalog or table is case insensitive or can you only do it by column?

szymon_dybczak · ‎01-16-2026

Hi @dpc ,

Like @emma_s mentioned - you can set it at catalog/schema. Table will inherit collation. But you can also define it explicitly for a specific column within a table using collate modifier

emma_s · ‎01-16-2026

Hi, You can set the default collation at Catalog level or schema level and the tables in the catalog will inherit the collation. This is supported from DBR 17.1 and above.

dpc · ‎01-20-2026

Thanks.

I'll test collation at catalog, sschema and table level using 17.1

dpc · ‎01-22-2026

Thanks @emma_s and @szymon_dybczak

Looks positive.

Some datatype issues moving to 17.1 but that's a separate issue.

I've run some simple tests and it works well so will trial it on out full data set

szymon_dybczak · ‎01-22-2026

Great @dpc , good to hear that 🙂

dpc · ‎01-23-2026

It works but there's a scenario that causes an issue.

If I create a schema with defaultcollation UTF8_LCASE

Then create a table, it marks all the string columns as UTF8_LCASE

Which is fine and works

If I create the table, in the newly created UTF8_LCASE schema from an existing table using:

CREATE TABLE <destination> AS

SELECT *

FROM <source>

WHERE 1 = 2;

The table is UTF8_LCASE but all the columns aren't

So, when I use it, it remains case sensitive

Any thoughts?

Am I doing something wrong here as I really want to create tables that are structurally the same with the exception of case

I am doing this as I land the data and the initial table is derived from the landed structure, before moving it.

Thanks

Databricks Community

Case insensitive data

FREE TRAINING: Databricks Business Impact Accelerator

DAIS 2026 Speaker Spotlight Series #15 | Julien Debard

🌟 Community Pulse: Your Weekly Roundup! May 25 – 31, 2026

Solution Accelerator Series | Recency, Frequency and Monetary (RFM) Segmentation

FLASH SALE: Save 50% on Summit Training ⚡