cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Data Governance
Join discussions on data governance practices, compliance, and security within the Databricks Community. Exchange strategies and insights to ensure data integrity and regulatory compliance.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

Databricks master data management capabilities

SHG97
New Contributor II

Hi there,

Please, I am trying to understand if Databricks is able to support master data management capabilities. Particularly, focusing on the following ones:

- Integrate and link different data systems: Connect various systems and make sure the data stays consistent across all of them (e.g., when a record is updated in one system, the change is automatically reflected in all connected systems)

- Manage data standardization rules: Establish and enforce rules to ensure data remains consistent across the organization (e.g., defining a standard format for date fields)

Please, any help or guidance is highly appreciated.

Thanks a lot!

 

 

3 REPLIES 3

SathyaSDE
Contributor

Hi,

Databricks is meant for that & it supports what ever feature you had asked in the OP.

1) You can ingest data from various sources - in both batch & streaming mode, various formats of data through Lakehouse architecture

2) In terms of data consistency while reading, also to manipulate the data / durability - pls read about Delta lake, open storage format with ACID support.

3)Yes, you can enforce constraints, schema enforcement, RBAC, Unity catalog - to centrally manage & follow data governance, compliance etc.

Here are some reference links:

https://docs.databricks.com/en/ingestion/index.html

https://learn.microsoft.com/en-us/azure/databricks/delta/

https://www.databricks.com/product/unity-catalog

Rjdudley
Contributor

Databricks supports MDM in the way that any off the shelf database also can--you just have to write all the code to handle the data standardization, survivorship and entity resolution rules.  You can absolutely do MDM in Databricks, the medallion architecture corresponds nicely to how traditional MDM systems categorize data, and the flexibility of Delta Live Tables pipelines makes it easy to write all the code you need.  Streaming tables can be used for near real time MDM, while the scalability of the Spark/Photon compute means you can also handle gigantic batches of data.

You can import Python libraries to assist with your coding and use those libraries in your Delta Live Tables pipelines.  But you are "build first", and I've worked at places with that mentality and it's fine, you just have to maintain a lot of code (but, you can get very customized MDM at potentially lower cost).

An advantage that Databricks has is the Databricks Marketplace.  You can subscribe to services from Dun and Bradstreet or Experian which will do the standardization and entity resolution for you, then return the golden records.  Another great feature Databricks has is the data expectations, which are used to measure data quality before any MDM work is done, and can quarantine bad data to keep it from ruining your golden records.  Both the Marketplace services and expectations are used in your DLT pipelines.

I've built entirely custom MDM systems on SQL Server.  They work, and they easily fit in to your enterprise's service inventory, but they really require a dedicated team to maintain the system.  MDM systems like Reltio are also lesser impact on your enterprise, but are SaaS so all you do is maintain the rules.  I describe systems like Informatica as "an enterprise lifestyle"--your business conforms to how they run.

In the end, all MDM systems cost money and take effort.  The decision largely comes down to whether you want to pay your own developers and system engineers, or someone else's.  All MDM systems have the same need for data quality, data governance and business rules governance, so those are the same across the board.  It is possible for different companies to take different paths, and all be right.

Rjdudley
Contributor

I failed to mention above, Databricks has several solution accelerators which support MDM/ER types of work.  They are meant to be examples of how, not to be used directly out of the box.

Customer Entity Resolution | Databricks

Entity Resolution for Public Sector | Databricks

Fuzzy Item Matching | Databricks

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you wonโ€™t want to miss the chance to attend and share knowledge.

If there isnโ€™t a group near you, start one and help create a community that brings people together.

Request a New Group