LIKE Operator Fails with COLLATE UNICODE_CI Columns

Phani1
Databricks MVP

 

Hi All,

We are experiencing problem while we are querying the table which has collate unicode_ci attribute enabled, Kindly find the below findings and let me know if any alternative approach to fix this issue?

Problem Statement

The LIKE operator in a query does not work when COLLATE UNICODE_CI is enabled on a table column, even though the = (equals) operator works fine with the same column and value.
 

Steps to Reproduce

1. Table Created with UNICODE_CI Property on Name Column (Case-Insensitive)

CREATE TABLE test_catalog_ph.bronze.my_table_cs_new2 (
id INT,
name STRING COLLATE UNICODE_CI
);

2. Sample Data Inserted

INSERT INTO test_catalog_ph.bronze.my_table_cs_new2 VALUES (1, 'TEST');
INSERT INTO test_catalog_ph.bronze.my_table_cs_new2 VALUES (2, 'test');

Query with = Operator = Works

SELECT * FROM test_catalog_ph.bronze.my_table_cs_new2
WHERE name = 'Test';

-- Returns 2 rows: 'TEST' and 'test' (case-insensitive match works)

Query with LIKE Operator — Fails

SELECT * FROM test_catalog_ph.bronze.my_table_cs_new2
WHERE name LIKE 'Test';

-- Throws error

Error Message

[DATATYPE_MISMATCH.UNEXPECTED_INPUT_TYPE] Cannot resolve "name LIKE Test"
due to data type mismatch: The first parameter requires the "STRING" type,
however "name" has the type "STRING COLLATE UNICODE_CI".
SQLSTATE: 42K09; line 102, pos 65

Our Findings

The root cause is a data type mismatch between the two sides of the LIKE expression:

Left side (column)-STRING COLLATE UNICODE_CI

Right side (literal)-STRING (UTF8_BINARY)

The Concern

  • The LIKE operator is one of the most commonly used SQL operators for pattern matching
  • Most end users and BI tools (including Power BI, Tableau, etc.) rely heavily on LIKE for filtering
  • If = works with UNICODE_CI columns but LIKE does not, this creates an inconsistent and confusing experience
  • End users should not be expected to add COLLATE UNICODE_CI to every LIKE query manually

Regards,

Phani