cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
cancel
Showing results for 
Search instead for 
Did you mean: 

Bug in unity catalog registering external database that is case sensitive

AurelioGesino
New Contributor II

I successfully registered in my Unity Catalog an external Database ```dwcore``` that is hosted on SQL server.
I first added the connection in "External Data": tested the connection and it was successful.
I then added the database on top: tested the connection and it was successful.

Then on the unity catalog explorer I checked I saw that the tables can be seen under the "dbo" schema.
I can see all table listed on the tree on the left.
But then, if I try to query any table, the table is not found by Spark.

I tested with another postgres database and everything works like a charm.
I think the issue is this: my SQL server table names are case sensitive.

So for example I have a table "AllAlarms" that in the external catalog gets registered all lowercase.
It becomes "allalarms".
So when I try to query the table in the unity catalog the real SQL table does not get resolved, because I'm querying with a lowercase name when it should have been registered registering its real case.
I tried as well to set Spark to be 

spark.sql.caseSensitive', True

but that does not help, because the error occurs when the table is registered in the catalog.

So I think this is a bug with Databricks.



2 REPLIES 2

Kaniz
Community Manager
Community Manager

Hi @AurelioGesinoIt seems you’ve encountered an issue with table names when connecting to an external SQL Server database in Databricks.

Let’s break down the situation and explore potential solutions:

  1. Table Name Case Sensitivity:

    • You’ve correctly identified that SQL Server table names are case sensitive. When you register these tables in the Unity Catalog, they are converted to lowercase.
    • For example, your table “AllAlarms” becomes “allalarms” in the catalog.
  2. Databricks Naming Convention:

    • Databricks follows a preferred naming convention for tables and databases: lowercase with underscores.
    • This convention aligns with the behavior of the Hive Metastore, which is case insensitive.
    • When querying tables, Databricks treats names as case insensitive, so “MyTable” is equivalent to “mytable” and “MYTABLE.”
  3. Catalog Registration:

    • The issue arises during catalog registration. When you register a table, its name is converted to lowercase.
    • Consequently, when you query using the original case (e.g., “AllAlarms”), Spark cannot find the corresponding table.
  4. Spark Configuration:

    • Setting spark.sql.caseSensitive to True won’t resolve this issue because it affects the behavior of Spark’s SQL queries, not the catalog registration.
    • The error occurs before Spark processes the query.
  5. Recommended Approach:

    • To work around this, follow Databricks’ naming convention:
      • Register your tables with lowercase names (e.g., “allalarms”).
      • Query using the same lowercase names.
    • If possible, rename your SQL Server tables to lowercase before registering them in the Unity Catalog.
  6. Documentation Reference:

  7. Bug Report:

    • If you believe this behaviour is unintended or problematic, consider reporting it as a bug to Databricks by filing a support ticket.

Remember that adhering to the lowercase naming convention will ensure consistency and compatibility within Databricks.

AurelioGesino
New Contributor II

Hi Kaniz, thank you for your feedback.
Databases that I have to link cannot be changed, other teams and products rely on those databases as well, this change would be disruptive.

I think this is a limitation in the unity catalog behaviour, as it should take in account that external databases could have a case sensitive collation, either:

- keeping the name of the tables as they are, leaving up to the user to deal with case issues for external tables.
- remapping that internally and hide it to the user
- providing some API where to hook on in the database registration, or perform it programmatically.

So I'll open up a ticket support for it.

Thank you

Welcome to Databricks Community: Lets learn, network and celebrate together

Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections. 

Click here to register and join today! 

Engage in exciting technical discussions, join a group with your peers and meet our Featured Members.