cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

SQL run on cluster creates table different to SQL Warehouse endpoint

blobbles78
New Contributor II

I have a Personal cluster version 15.4 LTS (includes Apache Spark 3.5.0, Scala 2.12) and a SQL Warehouse in a databricks environment.  When I use the following code to create a table in a catalog, it gives me different column types when run on the cluster vs the warehouse:

%sql
create or replace table [catalog].[schema].[test_table_name] 
using delta
comment 'This has a comment'
as
select 
  id, 
  name as new_name,
  created_date as new_created_date,
  current_timestamp() as test_timestamp,
  coalesce(name,'Replaced name') as test_coalesce_name,
  coalesce(id,'-1') as test_coalesce_id,
  coalesce(created_date,'2024-12-11') as test_coalesce_date
from (
  select 
    cast(col1 as int) as id, 
    cast(col2 as string) as name, 
    cast(col3 as date) as created_date
  from VALUES
  (1, 'Alice', '2024-12-01'),
  (2, 'Bob', '2024-12-02'),
  (3, 'Charlie', '2024-12-03'),
  (4, 'David', '2024-12-04'),
  (5, 'Eve', '2024-12-05'),
  (6, 'Frank', '2024-12-06'),
  (7, 'Grace', '2024-12-07'),
  (8, 'Hank', '2024-12-08'),
  (9, 'Ivy', '2024-12-09'),
  (10, 'Jack', '2024-12-10'),
  (11, NULL, '2024-12-11'),
  (NULL, 'NULL Values', NULL)
) as temp_table

When running on a SQL warehouse, the column types for the coalesce'd columns are resolved correctly. However, when running on a cluster, they are not resolved and are converted to strings.  Is this expected behaviour?

Have tried on two different databricks environments and have the same result.

1 ACCEPTED SOLUTION

Accepted Solutions

Walter_C
Databricks Employee
Databricks Employee

Can you try to set spark config spark.sql.ansi.enabled true in the cluster and test back?

View solution in original post

6 REPLIES 6

Walter_C
Databricks Employee
Databricks Employee

This should not be expected as both compute types should support the coalesce. Have you tried the same in clusters with other DBRs to confirm if this is only related to 15.4 or if is happening with any runtime?

blobbles78
New Contributor II

I have tried with Databricks Runtimes of 14.3 and 16.0 with the same results. I wouldn't like to use anything prior to 14.3, so don't want to try earlier versions.  Remember the computes all support the coalesce, its just that using it with a DBR instead of a SQL warehouse compute, the coalesce causes different behaviour. Specifically it stops respecting data types coming through the coalesce when using a DBR.


Walter_C
Databricks Employee
Databricks Employee

Appreciate your testing, I will check internally on this and will get back to you.

Walter_C
Databricks Employee
Databricks Employee

Can you try to set spark config spark.sql.ansi.enabled true in the cluster and test back?

blobbles78
New Contributor II

Yes, that seems to work! So in order to get the same behaviour for our clusters compared to our SQL Warehouse, we have to ensure the spark setting spark.sql.ansi.enabled=true? That seems like a setting that should default to be true, is there a reason it is defaulting to false?

Walter_C
Databricks Employee
Databricks Employee

It seems that as per docs as of now this setting is only true by default in warehouses in clusters it is set to false: https://docs.databricks.com/en/sql/language-manual/sql-ref-ansi-compliance.html#ansi-compliance-in-d... 

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you wonโ€™t want to miss the chance to attend and share knowledge.

If there isnโ€™t a group near you, start one and help create a community that brings people together.

Request a New Group