Hi, we recently made an upgrade to our Databricks warehouse, transitioning from SQL Classic to SQL PRO. However, we encountered the following error message when attempting to execute the "INSERT INTO" table query with a "REPLACE WHERE" predicate that employs the "IN" operator with a date column. This query was previously working fine with the Classic warehouse.
Error Message:
2023-05-17 (of class java.time.LocalDate)
Steps to reproduce:
1. Create a source table partitioned on product and country_code and insert values.
create
or replace table <catalog_name>.`default`.`source_sales` (
product_code int,
product string,
country_code string,
txn_date date,
amount int
) using delta partitioned by (product, country_code);
insert into
<catalog_name>.`default`.`source_sales`
values
('12', 'product1', 'in', DATE '2023-04-19', 1234),
('21', 'product2', 'in', DATE '2023-04-20', 1243),
('34', 'product3', 'in', DATE '2023-04-21', 2345),
('56', 'product4', 'uk', DATE '2023-05-03', 3456),
('78', 'product5', 'us', DATE '2022-05-17', 3214),
('89', 'product6', 'us', DATE '2022-05-18', 3254);
2. Create a target table inserting all the values from selected columns from the source table. The insertion should be successful
create
or replace table <catalog_name>.`default`.`target_sales` using delta partitioned by (product, country_code) tblproperties (
'spark.databricks.delta.replacewhere.constraintcheck.enabled' = 'false'
) as
select
product,
country_code,
txn_date,
amount
from
<catalog_name>.`default`.`source_sales`;
3. Update and insert new rows in source table
update
<catalog_name>.`default`.`source_sales`
set
amount = '1200'
where
country_code in ('in', 'us');
insert into
<catalog_name>.`default`.`source_sales`
values
('91', 'product7', 'us', DATE '2023-05-19', 5678);
4. Update/selectively overwrite the target table with replace where
create or replace temporary view `sales_temp` as
select
product,
country_code,
txn_date,
amount
from
<catalog_name>.`default`.`source_sales`
where country_code in ('in', 'us')
;
insert into
table <catalog_name>.`default`.`target_sales` replace
where
(
(
country_code = 'us'
and product = 'product3'
and txn_date in ('2023-05-17', '2023-05-18', '2023-05-19')
)
OR (
country_code = 'in'
and product = 'product1'
and txn_date in ('2023-04-19', '2023-04-20', '2023-04-21')
)
)
select
*
from
(`sales_temp`);
Expected Behaviour:
The target_sales table should be updated with the data from source_sales table as per the replace where condition
Observed Behaviour:
When we use txn_date column along with 'in' operator in replace where predicate we get the error as: 2023-05-17 (of class java.time.LocalDate)
Version:
Databricks Runtime Version: 12.2 LTS
JDBC URL: 2.6.25 or later
Channel: Current(v 2023.30)
Note:
- When we use the 'BETWEEN' operator instead of the 'IN' operator with the txn_date column, the 'INSERT INTO REPLACE WHERE' query executes successfully. However, for our use case, we specifically need to use the 'IN' operator. We encounter an error only when we compare the txn_date column with a list of dates using the 'IN' operator.
- We have also tried casting the string values used in the "IN" operator to DATE type, but the error persists.
The error description provides very little information about the issue and hence haven't been able to find a solution. We would appreciate any help or guidance you can provide to resolve this issue. Thank you for your assistance in advance.