cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

Case insensitive data

dpc
Contributor II

For all it's positives, one of the first general issues we had with databricks was case sensitivity.

We have a lot of data specific filters in our code

Problem is, we land and view data from lots of different case insensitive source systems e.g. SQL Server

As such, we have to be very careful with our code and convert columns to UPPER when making a comparison.

Most of our code is written in SQL.

 

About 18 months ago I asked whether there was going to be a catalog, schema or table setting for this i.e. make the object case insensitive.

I was told it was on its way.

Not heard anything since and cannot find anything.

 

Does anybody know whether this is in place or expected?

 

Thanks

 

5 REPLIES 5

szymon_dybczak
Esteemed Contributor III

Hi @dpc ,

I think you can try to use a collation for that purpose. A collation is a set of rules that determines how string comparisons are performed. Collations are used to compare strings in a case-insensitive, accent-insensitive, or trailing space insensitive manner, or to sort strings in a specific language-aware order.

Collation | Databricks on AWS

dpc
Contributor II

Thanks.

Collation is table specific though isn't it? and you have to apply it to each columns.

Is there a was to just say, this schema, catalog or table is case insensitive or can you only do it by column?

szymon_dybczak
Esteemed Contributor III

Hi @dpc ,

Like @emma_s  mentioned - you can set it at catalog/schema. Table will inherit collation. But you can also define it explicitly for a specific column within a table using collate modifier

szymon_dybczak_1-1768585044149.png

 

 

emma_s
Databricks Employee
Databricks Employee

Hi, You can set the default collation at Catalog level  or schema level and the tables in the catalog will inherit the collation. This is supported from DBR 17.1 and above.

dpc
Contributor II

Thanks.

I'll test collation at catalog, sschema and table level using 17.1