Databricks Community

ceceliac · ‎03-31-2025

What is the best way to get Salesforce Marketing Cloud data into Databricks? Lakeflow / Federation connectors are limited to Salesforce and Salesforce Data Cloud right now. Are there plans to add Salesforce Marketing Cloud? The only current option we can find is either an extension to FiveTran or using this Python connector: Python Data Stream Retrievals

Thanks!

Louis_Frolio · a month ago

Hey @ceceliac , Thanks for raising this — here’s the current picture and practical paths you can use today.

What Databricks supports today

The Lakehouse Federation connector for Salesforce Data Cloud is available and lets you query Data Cloud tables in place, zero-copy, under Unity Catalog governance.
The Lakeflow Connect ingestion connector for Salesforce Platform (Sales/Service Cloud) is GA and designed to copy CRM objects into Delta tables with incremental ingestion and UC governance.
The Salesforce ingestion connector does not currently support Marketing Cloud; recommended alternative is to route MC data into Data Cloud, then use the Data Cloud connectors from Databricks.

Options to get Marketing Cloud data into Databricks now

Route SFMC to Data Cloud, then federate in Databricks
If your org enables Salesforce Data Cloud and connects Marketing Cloud to it (Salesforce provides MC→Data Cloud integrations), you can use Databricks Lakehouse Federation’s Salesforce Data Cloud connector to query the unified dataset without ingesting it.

This is the most “native” zero-copy path with consistent governance and immediate access for analytics and ML in Databricks.

Use a partner ELT to ingest SFMC directly (e.g., Fivetran)
Fivetran provides a managed Salesforce Marketing Cloud connector that can land into Databricks. It supports core entities (EMAIL, SEND, EVENT, LIST, SUBSCRIBER, JOURNEY, etc.) and data extensions, noting that extensions are typically full re-imports due to API limitations.
Build a custom pipeline on SFMC APIs (Python)
Salesforce’s Marketing Cloud Data Streams API Python connector (the link you shared) can be used to pull SFMC data; land the outputs in cloud storage and hydrate Delta via Auto Loader or DLT for incremental processing and governance.

This approach gives you control over cadence, scope, and schema handling at the cost of more engineering ownership.

Roadmap for a Databricks-managed SFMC connector

We’re actively tracking demand and have an Aha idea for “Lakeflow Connect Connector for SFDC Marketing Cloud” and an entry on the Lakeflow Connect timelines showing SFMC “in development” (timelines are highly subject to change and not a commitment).
Internal guidance notes that Marketing Cloud is not supported in the current ingestion connector, and the recommended near-term path is MC→Data Cloud→Databricks Federation while we continue evaluating native SFMC ingestion demand and requirements.

Recommended architectures and trade-offs

Zero-copy (MC→Data Cloud→Federation)
Best when you already have Data Cloud or plan to; fastest time-to-insight, no ETL, and UC governance. Good for analytics/ML prototyping and production querying; you can materialize when needed for performance or downstream processing.
Managed ELT (Fivetran SFMC→Databricks)
Best when you want data resident in the lakehouse; covers a broad SFMC surface area and includes dbt models for SFMC analytics. Be aware data extensions often require daily full re-imports, which can lengthen syncs and may warrant separating those objects into a dedicated connection.
Custom ingestion (Python/Data Streams API)
Best when you have bespoke needs, want tighter control, or need to minimize vendor dependencies. You own resilience, retries, and schema evolution; Databricks Auto Loader/DLT provide incremental processing and governance once landed.

Regards, Louis