cancel
Showing results for 
Search instead for 
Did you mean: 
Community Platform Discussions
Connect with fellow community members to discuss general topics related to the Databricks platform, industry trends, and best practices. Share experiences, ask questions, and foster collaboration within the community.
cancel
Showing results for 
Search instead for 
Did you mean: 

Can data be unified based on client profile (unified profile) in databricks?

Ruby8376
Valued Contributor

Hi All,

my question is in regard to how data in salesforce data cloud gets unified based on client profiles. Can similar action be done on data in databricks. i believe unity catalog just provides unified layer for security and governance. is there a way to unify data as well? or will it involve writing business rules and transforming the data accordingly in code?

1 ACCEPTED SOLUTION

Accepted Solutions

Ruby8376
Valued Contributor

Thank you so much !!

View solution in original post

3 REPLIES 3

Ruby8376
Valued Contributor

@-werners- @Kaniz_Fatma can you please help?

-werners-
Esteemed Contributor III

You want to identify actual persons based on one or more profiles (based on e-mail address etc).  That is something that is not available out-of-the box in Databricks.  The 'unified' in Databricks means you have a single platform for several data topics like engineering, analytics, ML.
What you are looking for is in fact a Customer Data Platform which kan uniquely identify a natural person based on characteristics stored.
How Salesforce does it is probably a secret, but very probably they use a combination of name, address, e-mail address etc to check if there are different profiles pointing to the same natural person.
Basically there are 2 approaches in identifying person: deterministic an probabilistic.
The method I already described is deterministic (based on hard rules).  A more advanced technique, which can lead to better or worse results than the deterministic method, is the probabilistic method. Here we try to identify a person based on probabilistic models, so this is a form of statistical learning/machine learning.
A combination of both is also possible.

I already mentioned Databricks itself does not have an own CDP, but that does not mean you cannot build one using databricks.
The methods I described can be applied in notebooks. Also there is an interesting blog on the databricks site about Arc, a probablistic model:
https://www.databricks.com/blog/linking-unlinkables-simple-automated-scalable-data-linking-databrick...

There is a lot to be found online.  But beware Salesforce CDP does not only identify persons, there is also the whole UI experience, flexible filtering, creating campaigns etc.  That is something I do not see in Databricks.

Ruby8376
Valued Contributor

Thank you so much !!

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group