12-10-2024 07:23 PM
I have a Personal cluster version 15.4 LTS (includes Apache Spark 3.5.0, Scala 2.12) and a SQL Warehouse in a databricks environment. When I use the following code to create a table in a catalog, it gives me different column types when run on the cluster vs the warehouse:
%sql
create or replace table [catalog].[schema].[test_table_name] 
using delta
comment 'This has a comment'
as
select 
  id, 
  name as new_name,
  created_date as new_created_date,
  current_timestamp() as test_timestamp,
  coalesce(name,'Replaced name') as test_coalesce_name,
  coalesce(id,'-1') as test_coalesce_id,
  coalesce(created_date,'2024-12-11') as test_coalesce_date
from (
  select 
    cast(col1 as int) as id, 
    cast(col2 as string) as name, 
    cast(col3 as date) as created_date
  from VALUES
  (1, 'Alice', '2024-12-01'),
  (2, 'Bob', '2024-12-02'),
  (3, 'Charlie', '2024-12-03'),
  (4, 'David', '2024-12-04'),
  (5, 'Eve', '2024-12-05'),
  (6, 'Frank', '2024-12-06'),
  (7, 'Grace', '2024-12-07'),
  (8, 'Hank', '2024-12-08'),
  (9, 'Ivy', '2024-12-09'),
  (10, 'Jack', '2024-12-10'),
  (11, NULL, '2024-12-11'),
  (NULL, 'NULL Values', NULL)
) as temp_tableWhen running on a SQL warehouse, the column types for the coalesce'd columns are resolved correctly. However, when running on a cluster, they are not resolved and are converted to strings. Is this expected behaviour?
Have tried on two different databricks environments and have the same result.
12-11-2024 10:30 AM
Can you try to set spark config spark.sql.ansi.enabled true in the cluster and test back?
12-11-2024 03:43 AM
This should not be expected as both compute types should support the coalesce. Have you tried the same in clusters with other DBRs to confirm if this is only related to 15.4 or if is happening with any runtime?
12-11-2024 03:54 AM - edited 12-11-2024 03:56 AM
I have tried with Databricks Runtimes of 14.3 and 16.0 with the same results. I wouldn't like to use anything prior to 14.3, so don't want to try earlier versions.  Remember the computes all support the coalesce, its just that using it with a DBR instead of a SQL warehouse compute, the coalesce causes different behaviour. Specifically it stops respecting data types coming through the coalesce when using a DBR.
12-11-2024 04:01 AM
Appreciate your testing, I will check internally on this and will get back to you.
12-11-2024 10:30 AM
Can you try to set spark config spark.sql.ansi.enabled true in the cluster and test back?
12-11-2024 11:43 AM
Yes, that seems to work! So in order to get the same behaviour for our clusters compared to our SQL Warehouse, we have to ensure the spark setting spark.sql.ansi.enabled=true? That seems like a setting that should default to be true, is there a reason it is defaulting to false?
12-11-2024 12:54 PM
It seems that as per docs as of now this setting is only true by default in warehouses in clusters it is set to false: https://docs.databricks.com/en/sql/language-manual/sql-ref-ansi-compliance.html#ansi-compliance-in-d...
 
					
				
				
			
		
 
					
				
				
			
		
Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!
Sign Up Now