cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
cancel
Showing results for 
Search instead for 
Did you mean: 

How to deal with Decimal data type arithmetic operations ?

berserkersap
Contributor

I am dealing with values ranging from 10^9 to 10^-9 , the sum of values can go up to 10^20 and need accuracy.

So I wanted to use Decimal Data type [ Using SQL in Data Science & Engineering workspace].

However, I got to know the peculiar behavior of Decimal Data Type. That is, to change the precision to preserve accuracy.

Decimal Precision Scale

Decimal Precision Loss

Due to this behavior, I am facing issues while multiplying or dividing numbers. [I have a case of series of multiplications and divisions for transforming data]

When I multiply two numbers, the precision changes and when multiply again, there is a loss of accuracy.

For example

SELECT CAST(1234.456 as decimal(20,8))*CAST(23478.89076 as decimal(20,8))*CAST(345678.125567 as decimal(20,8))

The answer I am getting : 10019016421573.927554

Real Answer: 10019016421573.92755433905952

I am looking for scale around 8

Is there any specific way to handle this kind of scenario ?

Note: All the numeric columns in my case will have same data type.

1 REPLY 1

berserkersap
Contributor

Hello Everyone,

I understand that there is no best answer for this question.

So, I could only do the same thing I found when I surfed the net.

The method I found works when

  1. If you know the range of values you deal with (not just the input data but also the values you get after transformations like multiplication, addition, division)
  2. You know the precision you want.
  3. The precision + range lies within the decimal datatype range (38,6)

The method is simple, if you want p decimal accuracy, then multiply all the required numeric columns with 10^p and maintain the decimal datatype as decimal(38,6). Since the default decimal precision after exceeding the precision limit in mathematical operations is (38,6), you will not get any change in datatype after operations implemented.

For example, need 6 decimal precision, then multiply the column with 10^6 and do the operations. But do remember that if you multiply two columns then the resulting column will have 10^12 multiplier in it. So, Use ((C1/10^6)*C2) for such operations.

However, it is very crucial to ensure that there is no overflow or loss of precision by adhering to the 3 points above. However, this might be very difficult as we might not know the resulting sum of group. Hence, use the method judicially.

Decimal Module of python has a high flexibility regarding this (C++ has even more robust library) but it cannot be used in pyspark as pyspark uses Spark engine and doesn't support that type.

Please feel free to correct me if I am wrong.

Welcome to Databricks Community: Lets learn, network and celebrate together

Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections. 

Click here to register and join today! 

Engage in exciting technical discussions, join a group with your peers and meet our Featured Members.