cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Bug in unity catalog registering external database that is case sensitive

AurelioGesino
New Contributor II

I successfully registered in my Unity Catalog an external Database ```dwcore``` that is hosted on SQL server.
I first added the connection in "External Data": tested the connection and it was successful.
I then added the database on top: tested the connection and it was successful.

Then on the unity catalog explorer I checked I saw that the tables can be seen under the "dbo" schema.
I can see all table listed on the tree on the left.
But then, if I try to query any table, the table is not found by Spark.

I tested with another postgres database and everything works like a charm.
I think the issue is this: my SQL server table names are case sensitive.

So for example I have a table "AllAlarms" that in the external catalog gets registered all lowercase.
It becomes "allalarms".
So when I try to query the table in the unity catalog the real SQL table does not get resolved, because I'm querying with a lowercase name when it should have been registered registering its real case.
I tried as well to set Spark to be 

spark.sql.caseSensitive', True

but that does not help, because the error occurs when the table is registered in the catalog.

So I think this is a bug with Databricks.



2 REPLIES 2

Kaniz
Community Manager
Community Manager

Hi @AurelioGesinoIt seems you’ve encountered an issue with table names when connecting to an external SQL Server database in Databricks.

Let’s break down the situation and explore potential solutions:

  1. Table Name Case Sensitivity:

    • You’ve correctly identified that SQL Server table names are case sensitive. When you register these tables in the Unity Catalog, they are converted to lowercase.
    • For example, your table “AllAlarms” becomes “allalarms” in the catalog.
  2. Databricks Naming Convention:

    • Databricks follows a preferred naming convention for tables and databases: lowercase with underscores.
    • This convention aligns with the behavior of the Hive Metastore, which is case insensitive.
    • When querying tables, Databricks treats names as case insensitive, so “MyTable” is equivalent to “mytable” and “MYTABLE.”
  3. Catalog Registration:

    • The issue arises during catalog registration. When you register a table, its name is converted to lowercase.
    • Consequently, when you query using the original case (e.g., “AllAlarms”), Spark cannot find the corresponding table.
  4. Spark Configuration:

    • Setting spark.sql.caseSensitive to True won’t resolve this issue because it affects the behavior of Spark’s SQL queries, not the catalog registration.
    • The error occurs before Spark processes the query.
  5. Recommended Approach:

    • To work around this, follow Databricks’ naming convention:
      • Register your tables with lowercase names (e.g., “allalarms”).
      • Query using the same lowercase names.
    • If possible, rename your SQL Server tables to lowercase before registering them in the Unity Catalog.
  6. Documentation Reference:

  7. Bug Report:

    • If you believe this behaviour is unintended or problematic, consider reporting it as a bug to Databricks by filing a support ticket.

Remember that adhering to the lowercase naming convention will ensure consistency and compatibility within Databricks.

AurelioGesino
New Contributor II

Hi Kaniz, thank you for your feedback.
Databases that I have to link cannot be changed, other teams and products rely on those databases as well, this change would be disruptive.

I think this is a limitation in the unity catalog behaviour, as it should take in account that external databases could have a case sensitive collation, either:

- keeping the name of the tables as they are, leaving up to the user to deal with case issues for external tables.
- remapping that internally and hide it to the user
- providing some API where to hook on in the database registration, or perform it programmatically.

So I'll open up a ticket support for it.

Thank you

Join 100K+ Data Experts: Register Now & Grow with Us!

Excited to expand your horizons with us? Click here to Register and begin your journey to success!

Already a member? Login and join your local regional user group! If there isn’t one near you, fill out this form and we’ll create one for you to join!