cancel
Showing results for 
Search instead for 
Did you mean: 
Technical Blog
Explore in-depth articles, tutorials, and insights on data analytics and machine learning in the Databricks Technical Blog. Stay updated on industry trends, best practices, and advanced techniques.
cancel
Showing results for 
Search instead for 
Did you mean: 
Haritha_Sama
Contributor II

LakeFusion’s MDM solution delivers a single source of truth by leveraging advanced entity resolution and deduplication algorithms. By unifying fragmented data across systems, we ensure accurate, consistent, and reliable master records. Our platform enforces strict data governance policies, reducing duplicate records and improving data accuracy for mission-critical business processes.

Dashboard.png

Use cases

  • Data Quality
  • Deduplication
  • Entity Resolution
  • Patient/Provider/Payor 360
  • Customer/Product 360/Item Master

LakeFusion is optimized for large-scale data sets and real-time data operations, ensuring that data is always current and accurate. It leverages Databricks' Medallion Architecture to organize data into layers—Bronze, Silver, and Gold—for optimized data processing, enhancing data quality and enabling efficient querying and analytics within a unified and scalable environment.

The platform is designed to manage a wide range of data types, including customer data, product listings, and transactional data, making it suitable for various industries such as retail, financial services, and healthcare.

By integrating advanced match and merge technologies, LakeFusion ensures data accuracy and consistency, reducing redundancies and errors. It also automates routine data management tasks, allowing organizations to focus on strategic initiatives and enhancing overall productivity.

In summary, LakeFusion empowers businesses to harness the full potential of their data assets, driving efficiency, innovation, and informed decision-making through accurate, scalable, and cost-effective master data management.

Key Features:

Match & Merge: Utilizes AI to identify and consolidate duplicate records into a single, accurate "golden record," improving data consistency and supporting better decision-making.

Standardization: Enforces consistent formats and values across all master data using predefined business rules, reducing errors and improving data integration and reporting accuracy.

Business Users Offers an intuitive interface for managing MDM tasks, simplifying the identification and merging of duplicate records to ensure data accuracy.

Steps to create Golden records in LakeFusion

1. Create Dataset in LakeFusion

In this step, a dataset will be created within LakeFusion. The dataset serves as a reference to bronze tables, which typically contain raw, ingested data from various sources.

2. Create Entities

In this phase, entities will be created, which correspond to silver-layer tables in Databricks. Once the entities are created, new attributes (columns) will be defined. The dataset columns will then be mapped to these newly created attributes. This mapping process enables the system to consolidate column values into unified attributes, ensuring data consistency and facilitating downstream processing.

3. Apply Match Maven

The Match Maven process will be applied to identify and merge similar records based on matching criteria. This matching can be performed using Databricks GenAI models or custom models, enabling intelligent deduplication and entity resolution across datasets.

4. Entity Search

Within the Entities section, users can select a specific entity to review potential matches. Each record is assigned a matching score, indicating the degree of similarity between records.

Users have the option to:

  • Merge records if they are determined to be duplicates or belong to the same entity.
  • Discard records by marking them as "Not a Match", ensuring they are not merged incorrectly.

This functionality enhances data accuracy and integrity by allowing manual validation and refinement of entity resolution.

Example

Before applying MDM with LakeFusion:

We have patient data arriving from three different sources:

  • patient_pms (Practice Management System) - Stores patient as "Thomas C"
  • patient_hie (Health Information Exchange) - Stores patient as "Thomas Clrk"
  • patient_ehr (Electronic Health Records) - Stores patient as "Thomas Clark"

Each system records the same patient with slight variations in their name. This inconsistency can lead to problems in analytics, billing, and patient care coordination.

Haritha_Sama_0-1741618928054.png

After applying MDM with LakeFusion:

We applied MDM using LakeFusion in our Databricks Lakehouse environment. LakeFusion helps merge and deduplicate records, ensuring that a single, accurate patient profile is retained. After applying MDM, our system now has a single golden record:

  • Patient Name: Thomas Clark

Haritha_Sama_1-1741618928057.png

Get Started

Ready to see LakeFusion in action? Start your 14-day free trial today on Databricks Marketplace.

1 Comment
Contributors