cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

Precision and scale is getting changed in the dataframe while casting to decimal

_Orc
New Contributor

When i run the below query in databricks sql the Precision and scale of the decimal column is getting changed.

Select typeof(COALESCE(Cast(3.45 as decimal(15,6)),0));

o/p: decimal(16,6)

expected o/p: decimal(15,6)

Any reason why the Precision and scale is getting changed?

1 ACCEPTED SOLUTION

Accepted Solutions

-werners-
Esteemed Contributor III

There is a reason indeed.

And the reason is type coercion:

In your coalesce, you enter 0 as a second value. 0 is an integer.

So Spark will coerce this to a decimal type. For not losing any information, it needs 10 digits in front of the comma (max value of a signed integer is 2147483647 -> 10 digits).

So when you put (15,6) you only have 9 digits => spark coerces this to 16,6.

The same for (8,0) => 10,0

And (12,0) remains 12,0.

If you want to avoid this coercion, you can pass 0.00 (decimal) instead of 0 (int).

I totally agree that the type coercion can be confusing btw.

View solution in original post

5 REPLIES 5

-werners-
Esteemed Contributor III

There is a reason indeed.

And the reason is type coercion:

In your coalesce, you enter 0 as a second value. 0 is an integer.

So Spark will coerce this to a decimal type. For not losing any information, it needs 10 digits in front of the comma (max value of a signed integer is 2147483647 -> 10 digits).

So when you put (15,6) you only have 9 digits => spark coerces this to 16,6.

The same for (8,0) => 10,0

And (12,0) remains 12,0.

If you want to avoid this coercion, you can pass 0.00 (decimal) instead of 0 (int).

I totally agree that the type coercion can be confusing btw.

AmanSehgal
Honored Contributor III

I believe the default precision and scale changes as you change the scale.

For instance if you set precision and scale for casting as (4,0), then spark will default it to (10,0).

For (8,1), it'll default to (11,1)

For (8,2), it'll default to (12,2)

For (8,3), it'll default to (13,3)

For (8,4), it'll default to (14,4)

For (8,5), it'll default to (15,5)

For (8,6), it'll default to (16,6) and so on..

I think the logic is:

if precision-scale < 10:
    precision = 10+scale
else
    precision=precision

-werners-
Esteemed Contributor III

This is only correct if you pass an integer as 2nd argument in the coalesce.

F.e. Select typeof(COALESCE(Cast(3.45 as decimal(11,2)),cast(0 as long))) returns (22,2) because long requires 20 digits in front of the comma.

If you pass a decimal, other rules apply.

RKNutalapati
Valued Contributor

Hi @Om Singhโ€‹  - You can try below, Second statement will give you the required result

%sql
Select typeof(COALESCE(Cast(3.45 as decimal(15,6)),0)) as CastToDecimal
Union All
Select typeof(Cast(COALESCE(3.45,0) as decimal(15,6))) as CastToDecimal;
 

Result of above:

 

CastToDecimal

1 decimal(16,6)

2 decimal(15,6)

berserkersap
Contributor

You can use

typeof(COALESCE(Cast(3.45 as decimal(15,6)),0.0)); (instead of 0)

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you wonโ€™t want to miss the chance to attend and share knowledge.

If there isnโ€™t a group near you, start one and help create a community that brings people together.

Request a New Group