Databricks Community

blobbles78 · ‎12-10-2024

I have a Personal cluster version 15.4 LTS (includes Apache Spark 3.5.0, Scala 2.12) and a SQL Warehouse in a databricks environment. When I use the following code to create a table in a catalog, it gives me different column types when run on the cluster vs the warehouse:

%sql
create or replace table [catalog].[schema].[test_table_name] 
using delta
comment 'This has a comment'
as
select 
  id, 
  name as new_name,
  created_date as new_created_date,
  current_timestamp() as test_timestamp,
  coalesce(name,'Replaced name') as test_coalesce_name,
  coalesce(id,'-1') as test_coalesce_id,
  coalesce(created_date,'2024-12-11') as test_coalesce_date
from (
  select 
    cast(col1 as int) as id, 
    cast(col2 as string) as name, 
    cast(col3 as date) as created_date
  from VALUES
  (1, 'Alice', '2024-12-01'),
  (2, 'Bob', '2024-12-02'),
  (3, 'Charlie', '2024-12-03'),
  (4, 'David', '2024-12-04'),
  (5, 'Eve', '2024-12-05'),
  (6, 'Frank', '2024-12-06'),
  (7, 'Grace', '2024-12-07'),
  (8, 'Hank', '2024-12-08'),
  (9, 'Ivy', '2024-12-09'),
  (10, 'Jack', '2024-12-10'),
  (11, NULL, '2024-12-11'),
  (NULL, 'NULL Values', NULL)
) as temp_table

When running on a SQL warehouse, the column types for the coalesce'd columns are resolved correctly. However, when running on a cluster, they are not resolved and are converted to strings. Is this expected behaviour?

Have tried on two different databricks environments and have the same result.

Walter_C · ‎12-11-2024

Can you try to set spark config spark.sql.ansi.enabled true in the cluster and test back?

View solution in original post

Walter_C · ‎12-11-2024

This should not be expected as both compute types should support the coalesce. Have you tried the same in clusters with other DBRs to confirm if this is only related to 15.4 or if is happening with any runtime?

blobbles78 · ‎12-11-2024

I have tried with Databricks Runtimes of 14.3 and 16.0 with the same results. I wouldn't like to use anything prior to 14.3, so don't want to try earlier versions. Remember the computes all support the coalesce, its just that using it with a DBR instead of a SQL warehouse compute, the coalesce causes different behaviour. Specifically it stops respecting data types coming through the coalesce when using a DBR.

Walter_C · ‎12-11-2024

Appreciate your testing, I will check internally on this and will get back to you.

Walter_C · ‎12-11-2024

Can you try to set spark config spark.sql.ansi.enabled true in the cluster and test back?

blobbles78 · ‎12-11-2024

Yes, that seems to work! So in order to get the same behaviour for our clusters compared to our SQL Warehouse, we have to ensure the spark setting spark.sql.ansi.enabled=true? That seems like a setting that should default to be true, is there a reason it is defaulting to false?