cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Data Engineering
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

Is it ok to join on the decimal type fields? How does it affect the performance?

DK03
Contributor
2 REPLIES 2

-werners-
Esteemed Contributor III

Sure it is ok.

The performance of a join is mainly determined by the shuffle itself, potential data skew and the type of join (broadcasthasjoin, shufflehashjoin etc)

UmaMahesh1
Honored Contributor III

As @Werner Stinckensโ€‹ said, it would be ok. But generally decimal column joins are not recommended as other factors come into play like the precision, length etc..

.Also when you are joining in on decimal columns, be sure to check out the abs value of the difference between the two column values is nearly 0 basis on your requirement. e.g. 0.0000000001. This is because you don't want to mess up the join just because of a 1 in a billion difference or error which might be creeping in because of some transformations somewhere or data quality issues.

Do research well before doing joins on decimal columns.

Cheers.

Welcome to Databricks Community: Lets learn, network and celebrate together

Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections. 

Click here to register and join today! 

Engage in exciting technical discussions, join a group with your peers and meet our Featured Members.