cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

Is it ok to join on the decimal type fields? How does it affect the performance?

DK03
Contributor
2 REPLIES 2

-werners-
Esteemed Contributor III

Sure it is ok.

The performance of a join is mainly determined by the shuffle itself, potential data skew and the type of join (broadcasthasjoin, shufflehashjoin etc)

UmaMahesh1
Honored Contributor III

As @Werner Stinckensโ€‹ said, it would be ok. But generally decimal column joins are not recommended as other factors come into play like the precision, length etc..

.Also when you are joining in on decimal columns, be sure to check out the abs value of the difference between the two column values is nearly 0 basis on your requirement. e.g. 0.0000000001. This is because you don't want to mess up the join just because of a 1 in a billion difference or error which might be creeping in because of some transformations somewhere or data quality issues.

Do research well before doing joins on decimal columns.

Cheers.

Uma Mahesh D

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you wonโ€™t want to miss the chance to attend and share knowledge.

If there isnโ€™t a group near you, start one and help create a community that brings people together.

Request a New Group