2 weeks ago
I have a Personal cluster version 15.4 LTS (includes Apache Spark 3.5.0, Scala 2.12) and a SQL Warehouse in a databricks environment. When I use the following code to create a table in a catalog, it gives me different column types when run on the cluster vs the warehouse:
%sql create or replace table [catalog].[schema].[test_table_name] using delta comment 'This has a comment' as select id, name as new_name, created_date as new_created_date, current_timestamp() as test_timestamp, coalesce(name,'Replaced name') as test_coalesce_name, coalesce(id,'-1') as test_coalesce_id, coalesce(created_date,'2024-12-11') as test_coalesce_date from ( select cast(col1 as int) as id, cast(col2 as string) as name, cast(col3 as date) as created_date from VALUES (1, 'Alice', '2024-12-01'), (2, 'Bob', '2024-12-02'), (3, 'Charlie', '2024-12-03'), (4, 'David', '2024-12-04'), (5, 'Eve', '2024-12-05'), (6, 'Frank', '2024-12-06'), (7, 'Grace', '2024-12-07'), (8, 'Hank', '2024-12-08'), (9, 'Ivy', '2024-12-09'), (10, 'Jack', '2024-12-10'), (11, NULL, '2024-12-11'), (NULL, 'NULL Values', NULL) ) as temp_table
When running on a SQL warehouse, the column types for the coalesce'd columns are resolved correctly. However, when running on a cluster, they are not resolved and are converted to strings. Is this expected behaviour?
Have tried on two different databricks environments and have the same result.
2 weeks ago
Can you try to set spark config spark.sql.ansi.enabled true in the cluster and test back?
2 weeks ago
This should not be expected as both compute types should support the coalesce. Have you tried the same in clusters with other DBRs to confirm if this is only related to 15.4 or if is happening with any runtime?
2 weeks ago - last edited 2 weeks ago
I have tried with Databricks Runtimes of 14.3 and 16.0 with the same results. I wouldn't like to use anything prior to 14.3, so don't want to try earlier versions. Remember the computes all support the coalesce, its just that using it with a DBR instead of a SQL warehouse compute, the coalesce causes different behaviour. Specifically it stops respecting data types coming through the coalesce when using a DBR.
2 weeks ago
Appreciate your testing, I will check internally on this and will get back to you.
2 weeks ago
Can you try to set spark config spark.sql.ansi.enabled true in the cluster and test back?
2 weeks ago
Yes, that seems to work! So in order to get the same behaviour for our clusters compared to our SQL Warehouse, we have to ensure the spark setting spark.sql.ansi.enabled=true? That seems like a setting that should default to be true, is there a reason it is defaulting to false?
2 weeks ago
It seems that as per docs as of now this setting is only true by default in warehouses in clusters it is set to false: https://docs.databricks.com/en/sql/language-manual/sql-ref-ansi-compliance.html#ansi-compliance-in-d...
Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you wonโt want to miss the chance to attend and share knowledge.
If there isnโt a group near you, start one and help create a community that brings people together.
Request a New Group