cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Understanding least common type in databricks

Dhruv-22
New Contributor III
I was reading the data type rules and found about least common type.

I have a doubt. What is the least common type of STRING and INT? The referred link gives the following example saying the least common type is BIGINT.

-- The least common type between an INT and STRING is BIGINT
> SELECT typeof(coalesce(5, '6'));
  BIGINT

However, running this on Databricks 13.3, Spark 3.4.1 and Scala 2.12 configurations gives the following output.

> SELECT typeof(coalesce(5, '6'));
  STRING

If I run explain on the query, then I get the following output.

> EXPLAIN EXTENDED SELECT typeof(coalesce(5, '6'));
== Analyzed Logical Plan ==
typeof(coalesce(5, 6)): string
Project [typeof(coalesce(cast(5 as string), 6)) AS typeof(coalesce(5, 6))#239637]
+- OneRowRelation

It shows that INT is cast to STRING so the least common type should be STRING.

However, if I run an equal to operation between STRING and INT, then the explain query gives different output.

> EXPLAIN EXTENDED SELECT '1.00' = 1;
== Analyzed Logical Plan ==
(1.00 = 1): boolean
Project [(cast(1.00 as int) = 1) AS (1.00 = 1)#239661]
+- OneRowRelation

Here STRING is cast to INT.

Both coalesce and = rely on the least common type and give different results. So, what is exactly the least common type of STRING and INT? Also, which is more narrower STRING or INT? Any simple data type can be converted to STRING without any data loss, like an INT can be converted to STRING without any loss. However, the reverse is not true. If a STRING containing a decimal value is converted to INT, there would be a value loss. So shouldn't STRING be the wider type?

1 REPLY 1

Dhruv-22
New Contributor III

Hey, if STRING is the wider type then why did you say that the least common type between STRING and INT is STRING?

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group