cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Get Started Discussions
Start your journey with Databricks by joining discussions on getting started guides, tutorials, and introductory topics. Connect with beginners and experts alike to kickstart your Databricks experience.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

Can data be unified based on client profile (unified profile) in databricks?

Ruby8376
Valued Contributor

Hi All,

my question is in regard to how data in salesforce data cloud gets unified based on client profiles. Can similar action be done on data in databricks. i believe unity catalog just provides unified layer for security and governance. is there a way to unify data as well? or will it involve writing business rules and transforming the data accordingly in code?

1 ACCEPTED SOLUTION

Accepted Solutions

Ruby8376
Valued Contributor

Thank you so much !!

View solution in original post

3 REPLIES 3

Ruby8376
Valued Contributor

@-werners- @Retired_mod can you please help?

-werners-
Esteemed Contributor III

You want to identify actual persons based on one or more profiles (based on e-mail address etc).  That is something that is not available out-of-the box in Databricks.  The 'unified' in Databricks means you have a single platform for several data topics like engineering, analytics, ML.
What you are looking for is in fact a Customer Data Platform which kan uniquely identify a natural person based on characteristics stored.
How Salesforce does it is probably a secret, but very probably they use a combination of name, address, e-mail address etc to check if there are different profiles pointing to the same natural person.
Basically there are 2 approaches in identifying person: deterministic an probabilistic.
The method I already described is deterministic (based on hard rules).  A more advanced technique, which can lead to better or worse results than the deterministic method, is the probabilistic method. Here we try to identify a person based on probabilistic models, so this is a form of statistical learning/machine learning.
A combination of both is also possible.

I already mentioned Databricks itself does not have an own CDP, but that does not mean you cannot build one using databricks.
The methods I described can be applied in notebooks. Also there is an interesting blog on the databricks site about Arc, a probablistic model:
https://www.databricks.com/blog/linking-unlinkables-simple-automated-scalable-data-linking-databrick...

There is a lot to be found online.  But beware Salesforce CDP does not only identify persons, there is also the whole UI experience, flexible filtering, creating campaigns etc.  That is something I do not see in Databricks.

Ruby8376
Valued Contributor

Thank you so much !!

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local communityโ€”sign up today to get started!

Sign Up Now