cancel
Showing results for 
Search instead for 
Did you mean: 
Technical Blog
Explore in-depth articles, tutorials, and insights on data analytics and machine learning in the Databricks Technical Blog. Stay updated on industry trends, best practices, and advanced techniques.
cancel
Showing results for 
Search instead for 
Did you mean: 
abhay-jalisatgi
Databricks Employee
Databricks Employee

Introduction

In this post, we’ll explore how Magnite and Databricks collaborated to build:

  • A foundation for secure and automated data publishing: a Python wheel file that customers install to set up the share, create the recipient, enable Delta Sharing, configure CDF, and validate Membership and Taxonomy tables,  streamlining setup and enforcing consistent, standards-compliant publishing.
  • Automated data consumption for Magnite: a workflow that listens for new customer shares, validates table standards, applies transformations, and delivers data into Magnite’s environment in a scalable, observable manner.
  • Accelerated adoption through the Databricks Marketplace listing: complete with schema requirements, prerequisites, runbooks and customer-facing content to ensure consistent data quality, governance, and secure access across all onboarded customers.

abhayjalisatgi_0-1778712426618.png

 

The Challenge

In today’s data-driven ecosystem, Magnite relies on it’s customers to share clean, trusted datasets. This inbound data-sharing setup often comes with challenges:

  • Data Producers (Magnite’s customers) need to configure sharing endpoints, recipients and security settings correctly.
  • Data Consumer (Magnite) must validate schemas, consume/transform the data, and maintain governance across many disparate shares.
  • Onboarding and scaling this across many customers becomes labour-intensive and error-prone.

 

To solve these challenges, Magnite and Databricks designed a solution that automates customer onboarding, removes manual data-sharing workflows, and enforces consistent validation and governance at scale, enabling standardized publishing by customers and streamlined consumption by Magnite using the Lakehouse and Delta Sharing.

 

Solution Architecture

Here’s how the end-to-end workflow is structured:

Marketplace & Onboarding

  • A listing on the Databricks Marketplace provides customers with the schema requirements for the Membership and Taxonomy tables, setup prerequisites, a notebook-based runbook, and a utility Python wheel file.
  • The Python wheel file is distributed through the Databricks Marketplace listing. Upon approval, customers receive access via a shared Unity Catalog volume, allowing them to securely install and run the setup workflow directly in their Databricks workspace, without requiring custom distribution.
  • This self-service experience significantly reduces onboarding friction and scales the publishing process across multiple customers.

Customer Side (Publishing)

  1. The customer downloads the Python wheel file and the notebook based run-book  in their Databricks workspace.
  2. The notebook utilizes the wheel file to run validation on the membership and taxonomy tables, configure CDF on specified tables, set up the share, create the recipient and enable Delta Sharing.
  3. Once Delta Sharing is enabled, the tables are ready for consumption by Magnite.

abhayjalisatgi_1-1778712540025.jpeg

 

Magnite Side (Consumption and Automation)

On the Magnite side, automation is driven by a Lakeflow job that continuously listens at fixed intervals for newly created Delta Shares from customers.

When a new share is detected, the workflow performs the following steps:

  1. Validate schemas and table standards to ensure the shared Membership and Taxonomy tables conform to Magnite’s publishing requirements.
  2. Create a per-customer catalog that maps the shared tables into Magnite’s metastore, providing logical isolation and governance boundaries for each onboarded customer.
  3. Provision a per-customer Lakeflow ingestion job, which is dynamically instantiated to process that customer’s data feed.
  4. Trigger alerts and persist share-detection metadata to support operational visibility and onboarding workflows.

Each per-customer Lakeflow job then:

  1. Reads incremental updates from the customer’s tables using Change Data Feed (CDF).
  2. Applies transformations such as filtering, partitioning, and format conversions (g-zip compressed JSON).
  3. Writes the processed output into a governed UC external volume for downstream consumption.
  4. Records detailed processing metadata, including CDF record counts, commit versions, and job execution metrics into Delta Lake tables for auditability, monitoring, and troubleshooting.

This two-tier design cleanly separates orchestration from data processing, enabling Magnite to onboard new customers automatically while maintaining isolation, observability, and consistent operational behavior at scale.

abhayjalisatgi_2-1778712573710.jpeg

 

Why Databricks Made This Architecture Possible

This solution is deeply enabled by native Databricks platform capabilities that simplify onboarding, enforce data quality, and make inbound data sharing scalable and operationally reliable.

Databricks Marketplace productizes onboarding into a self-service experience. Magnite’s customers get a standardized listing with schema requirements, guided runbooks, and a packaged Python wheel for validation and setup, dramatically reducing onboarding time and enabling consistent adoption across many customers.

Delta Sharing provides the core sharing abstraction. Magnite’s customers are  able to  programmatically create shares and recipients, enable sharing, and expose governed, read-only access to live Delta tables. This enables zero-copy sharing, where data remains in the provider’s environment and is accessed without duplicating datasets or building custom export pipelines, reducing the overhead and complexity of data movement while preserving secure, governed access to live data.

Delta Lake Change Data Feed (CDF) is what makes scalable ingestion practical for Magnite’s use case. While Delta Sharing enables live access to customer tables, Magnite requires its own managed copy of the data for transformation, enrichment, downstream processing, and operational isolation. CDF allows Magnite to maintain that copy efficiently by consuming only incremental row-level changes instead of performing full table refreshes. Without CDF, Magnite would need to build and operate a custom CDC mechanism to replicate partner data into its environment.

Databricks REST APIs are what make the workflow fully automated at scale. Magnite uses APIs to discover new shares, enumerate shared tables/metadata, create per-customer catalogs, provision per-customer Lakeflow jobs, and update operational queries/alerts, eliminating manual onboarding steps and lowering engineering overhead.

Lakeflow Jobs unify orchestration and streaming. A persistent “share listener” job detects new customer shares, validates schemas, provisions ingestion jobs, and runs transformation pipelines.
The Databricks Lakehouse stores operational metadata, CDF metrics, and audit logs in Delta tables, providing built-in observability and troubleshooting.

Together, these capabilities reduce onboarding friction, improve data quality and consistency, lower engineering overhead, and enable scalable adoption. Without Databricks, Magnite would have needed custom export pipelines, external orchestration, bespoke governance layers, a separate control plane for automation, and manual customer setup resulting in higher cost, slower onboarding, and a more fragile system.

 

Business Value

Creating this new pipeline into Magnite opens up new pathways for data monetization, reinforcing our broader narrative of flexible, privacy-safe onboarding. By integrating directly with Databricks, we make it easier for partners to activate their data within Magnite, expanding how and where data can drive value. 

 

Conclusion & Next Steps

Inbound data sharing is becoming critical in modern data ecosystems. By leveraging the Databricks Lakehouse, Delta Sharing, Change Data Feed, and a Marketplace-first approach, Magnite has built a solution that makes it easy for customers to publish trusted data, and easy for Magnite to consume it at scale. Together, this enables faster insights, stronger collaboration, and greater value from shared data.

This solution is also a strong example of how Databricks Professional Services partners with customers to accelerate time to value. By combining deep platform expertise with Magnite’s domain knowledge, the joint team delivered a scalable, production-grade inbound data-sharing architecture, turning what would have been a complex, bespoke integration into a repeatable, productized workflow.