Databricks Community

Dhruv-22 · ‎04-03-2024

I was reading the data type rules and found about least common type.

I have a doubt. What is the least common type of STRING and INT? The referred link gives the following example saying the least common type is BIGINT.

-- The least common type between an INT and STRING is BIGINT
> SELECT typeof(coalesce(5, '6'));
  BIGINT

However, running this on Databricks 13.3, Spark 3.4.1 and Scala 2.12 configurations gives the following output.

> SELECT typeof(coalesce(5, '6'));
  STRING

If I run explain on the query, then I get the following output.

> EXPLAIN EXTENDED SELECT typeof(coalesce(5, '6'));
== Analyzed Logical Plan ==
typeof(coalesce(5, 6)): string
Project [typeof(coalesce(cast(5 as string), 6)) AS typeof(coalesce(5, 6))#239637]
+- OneRowRelation

It shows that INT is cast to STRING so the least common type should be STRING.

However, if I run an equal to operation between STRING and INT, then the explain query gives different output.

> EXPLAIN EXTENDED SELECT '1.00' = 1;
== Analyzed Logical Plan ==
(1.00 = 1): boolean
Project [(cast(1.00 as int) = 1) AS (1.00 = 1)#239661]
+- OneRowRelation

Here STRING is cast to INT.

Both coalesce and = rely on the least common type and give different results. So, what is exactly the least common type of STRING and INT? Also, which is more narrower STRING or INT? Any simple data type can be converted to STRING without any data loss, like an INT can be converted to STRING without any loss. However, the reverse is not true. If a STRING containing a decimal value is converted to INT, there would be a value loss. So shouldn't STRING be the wider type?