cancel
Showing results for 
Search instead for 
Did you mean: 
Technical Blog
Explore in-depth articles, tutorials, and insights on data analytics and machine learning in the Databricks Technical Blog. Stay updated on industry trends, best practices, and advanced techniques.
cancel
Showing results for 
Search instead for 
Did you mean: 
prasannacs
Databricks Employee
Databricks Employee

Today, consumers leverage technology to enrich their shopping encounters through digital engagements, AI-driven interactions, and other digital channels before completing a purchase. On the other hand, sellers often need more technological support while consumers are empowered, resulting in a notable disparity. A modern seller experience platform should strive to enable sellers equally or more than the consumer. It should afford sellers access to all requisite data points and touchpoints, encompassing visual representations of their sales objectives, intelligent analytics, and sales-assist chatbots or AI-driven recommendations.

Databricks and Salesforce— leaders in the data, AI, and customer relationship management (CRM) fields, offer compelling interplays that can revolutionize the seller experience. The Databricks Data Intelligence Platform allows your entire organization to use data and AI. Built on an open data lakehouse, it provides an open, unified foundation that can process and analyze large amounts of data efficiently and understands the unique semantics of your data. Partnering Databricks with Salesforce, the leading CRM platform, can result in a potent combination that streamlines sales processes and provides a modern seller experience. By combining Databricks' advanced analytics and machine learning models with Salesforce's CRM data, sellers can gain deeper insights into customer behaviors, predict sales trends, and personalize their engagement strategies more effectively.

Data strategy for ADP - Salesforce-Databricks PoV (3).jpeg

In this blog, we will review how working with Databricks and Salesforce can benefit an organization's sales and marketing teams. As shown in the diagram (see above), the three key areas that Databricks enables when integrating with sales and marketing systems are:

  • Sales and marketing insights
  • Machine learning and GenAI services
  • Secure external data sharing

In this example, Salesforce is the sales and marketing experience layer, while Databricks is the Data and AI processing layer. The processed data or insights can be served across multiple channels, including dashboards, reports, custom applications, and APIs. They can also be shared externally via a Databricks data-sharing mechanism.

Sales and marketing insights

 Integrating diverse data sources—such as Salesforce Sales and Marketing Cloud, Salesforce Data Cloud, Seismic, Gong, and other sales and marketing data sources—into a unified and open platform, Databricks, enables organizations to have a holistic view of customer behavior and campaign performance. Therefore, sales and marketing teams are empowered to conduct deep, real-time analysis, monitor the effectiveness of their programs, adjust strategies promptly, and optimize return on investment.

 Databricks recently introduced native data connectors for seamless integration with Salesforce. These connectors enable customers to access and derive insights from their data in Salesforce CRM and Data Cloud from Databricks with LakeFlow Connect and Lakehouse Federation. Check out this blog for more information. 

Databricks Salesforce Connectors:

Databricks LakeFlow Connect offers simple and efficient data ingestion for databases, file sources, and enterprise applications. The LakeFlow Connect Salesforce Connector enables easy ingestion of Salesforce Sales data into Databricks and joins CRM insights with data in the Databricks Data Intelligence Platform, allowing data teams to deliver additional insights and more accurate predictions. LakeFlow Connect is simple to set up and maintain, is governed by Databricks Unity Catalog, and supports incremental data processing. This prevents customers from managing data pipeline infrastructure and creating complex logic for incremental data updates and merges. 

While LakeFlow Connect ingests the data into Databricks, Salesforce Data Cloud Connector, powered by Databricks Lakehouse Federation, allows customers to discover, query, and govern Salesforce data from Databricks without data migration. Currently, the Data Cloud Connector leverages JDBC and Databricks plans to add support for Delta Lake UniForm File Federation to enable larger-scale data sharing across platforms. With these approaches, BYOL (Bring-Your-Own-Lake), data in the Salesforce Marketing Cloud can also be ingested into Databricks via the Salesforce Data Cloud Connector. Both LakeFlow Connect and BYOL methods provide simple and flexible options for working with Salesforce data from Databricks, empowering customers to select the best choice for their needs. 

Here’s an illustration of combining Salesforce, Gong, and Google Analytics datasets into Databricks and how Databricks enables lead attribution for the Sales teams. The raw/source datasets from Salesforce, Gong, and Google Analytics can be replicated into Databricks with LakeFlow Connect, JDBC, or via 3rd party ETL tools.

Salesforce Example Datasets:

Leads Dataset

Lead ID

Name

Source

Status

Score

Owner

L001

Jane Doe

Website Form

Qualified

85

Alex Smith

L002

John Smith

Trade Show

New

40

Lisa Brown

L003

Emily White

Email Campaign

Contacted

70

Alex Smith

Opportunities Dataset

Opportunity ID

Lead ID

Stage

Amount

Close Date

Probability (%)

O001

L001

Negotiation

50,000

2024-12-15

70

O002

L002

Proposal

30,000

2024-12-20

50

O003

L003

Discovery

10,000

2024-12-25

30

Gong Example Dataset:

Call Transcripts and Insights

Call ID

Opportunity ID

Sentiment

Keywords

Duration (mins)

Speaker Ratio (Rep/Client)

C001

O001

Positive

Pricing, ROI

45

60:40

C002

O002

Neutral

Competitors

30

70:30

C003

O003

Negative

Budget, Timeline

20

50:50

Deal Progression

Opportunity ID

Risk Indicator

Last Activity Date

Next Steps

O001

None

2024-12-05

Follow-up meeting on 2024-12-10

O002

Stalled for 7 days

2024-11-28

Email to confirm proposal

O003

Low engagement

2024-11-30

Schedule a discovery call

 Google Analytics Example Dataset:

 Traffic Metrics

Visitor ID

Source/Medium

Pages Viewed

Session Duration (mins)

Device Type

V001

Google/Organic

5

10

Mobile

V002

LinkedIn/Paid

3

7

Desktop

V003

Direct/None

8

20

Tablet

Conversion Metrics

Visitor ID

Conversion Type

Conversion Value

Conversion Date

V001

Form Submission

0

2024-12-01

V002

E-commerce Purchase

2000

2024-12-02

V003

Whitepaper Download

0

2024-12-03

Use case scenario: Lead Attribution

Once the raw data from the sales and marketing systems is in Databricks as streaming tables, data engineers can declaratively configure and build DLT (Delta Live Tables) pipelines. The DLT pipelines transform the data from multiple raw sources into business aggregated views such as materialized views to keep the data fresh. Any changes to data from the source systems will be processed incrementally and made available to the business aggregated views (referred to as the silver and gold layer in the Medallion lakehouse architecture). Data engineers can also set up data quality constraints in the DLT pipelines to ensure data quality. The DLT pipelines that materialize the views are managed by Databricks, relieves the customers from managing data pipeline infrastructure. The incremental data processing from source(Salesforce Sales Cloud or bronze tables) to destination (materialized views) ensures a near real-time capture of sales and marketing data, transforming them into business insights. In the below example of a business aggregated view, Salesforce leads are matched with their first website visit tracked in Google Analytics and enriched with Gong conversation insights. With this derived insight, each lead's attribution score can be calculated and ranked.

 

Lead ID

Name

Source/Medium

Pages Viewed

Call Sentiment

Stage

Amount

Engagement Score

L001

Jane Doe

Google/Organic

5

Positive

Negotiation

50,000

70

L002

John Smith

LinkedIn/Paid

3

Neutral

Proposal

30,000

50

L003

Emily White

Direct/None

8

Negative

Discovery

10,000

30

Jane Doe (L001) originated from organic search and showed high engagement (5 pages viewed). Positive Gong sentiment and advanced deal stage suggest high likelihood of closure.

John Smith (L002) came via LinkedIn ads but engagement metrics and deal progression are moderate. Risk identified in Gong suggests follow-up required.

Emily White (L003) shows high web engagement but a negative Gong sentiment and early stage suggest a risk of churn. Action: Address objections during discovery.

The sales and marketing data insights derived are often passed down to the sellers as dashboards and reports. More often, these insights are static and miss the seller's intuitions. 

Databricks AI/BI Genie enables sellers to interact with their data using natural language. It also facilitates interactive data exploration, allowing the sellers to delve into metrics and discover more profound insights. AI/BI Genie capabilities are also available as APIs, allowing integration into LLMs for building Agents. Below are a few examples of how a seller can interact with the lead attribution dataset in a natural language with Databricks Genie.

  1. What are the top sources generating high-quality leads?
    • Are leads from the "Website Form" or "Trade Shows" performing better in terms of conversion rates?
  2. What is the average number of pages viewed by leads in each stage?
  3. Which stage of the pipeline has the most drop-offs?
    • Are most leads failing to progress beyond "Proposal" or "Negotiation" stages?
  4. Are there seasonal trends in lead source performance?
    • For example, do trade show leads perform better in Q2 than in Q4?

Machine learning and GenAI services - Revolutionizing Seller Platforms with Databricks Mosaic AI

Unlike Salesforce Einstein AI, Databricks Mosaic AI offers an array of capabilities, including model development, serving, inference, evaluation, monitoring, and observability, and comprehensively analyzes diverse data sources such as web analytics, social media, and third-party data. Databricks can amalgamate these insights with Salesforce data, providing a more holistic view of customer behavior than what Salesforce Einstein can offer. Furthermore, Databricks ML/AI/GenAI models can be exposed as a service within Einstein AI (BYOM), enabling Salesforce applications to develop enriched experiences based on Databricks Mosaic AI models. Databricks is the sole AI platform that unifies governance for all machine learning assets—from data and features to models—into a single catalog. This ensures complete visibility and meticulous control throughout the AI workflow. The platform’s integration of data and AI allows for automatic lineage tracking, centralized governance, collaboration, and monitoring capabilities to identify anomalies within all data and AI workflows, reducing time to value and operational costs.

Here’s an example of leveraging Salesforce, Gong, and Google Analytics to train a machine learning model that predicts the likelihood of successfully closing a deal. At a high-level

  1. Create feature tables for each source (Salesforce, Gong, Google Analytics).
  2. Combine them into a unified dataset.
  3. Use the dataset for training and deploying ML models.
  4. Serve features for real-time inference and analytics.

Feature Engineering

The model incorporates various features from different sources. From Salesforce, we use the Lead Score, which indicates lead quality, the current Stage in the sales pipeline (e.g., Discovery or Proposal), the Opportunity Amount in monetary terms, and the Close Date, which is the expected timeframe for closing the deal. Gong features include Call Sentiment, a classification of sentiment as Positive, Neutral, or Negative; Call Duration, the total time spent on calls related to the deal; Keywords, the presence of specific terms encoded as binary or frequency count; and a Risk Indicator that flags stalled deals or low engagement. Google Analytics contributes additional features, such as Source/Medium, which captures the origin of traffic (e.g., Organic or Paid), Pages Viewed, the number of pages visited by the lead, Session Duration, the total time spent on the website, and Conversion Actions, indicating high-value actions taken by the lead, such as form submissions.

The model's target variable is Deal Closure, which is a binary outcome: 1 for closed deals and 0 for lost deals.

Model implementation workflow

 The workflow consists of several key steps. First, in the Data Preprocessing phase, datasets are joined using unique identifiers like Lead ID or Opportunity ID, and missing values are handled appropriately, including encoding categorical features into numerical values. In the Feature Engineering stage, a composite risk score is created using information from Gong and Salesforce, and website behavior metrics are aggregated for analysis.

 Next, the Model Selection process involves training a classification model, such as Random Forest or XGBoost, and splitting the data into training and testing sets for evaluation. During Model Evaluation, performance metrics like accuracy and F1-score are utilized to assess the model's effectiveness, alongside hyperparameter tuning for optimal results.

Example Dataset for Training

Lead ID

Lead Score

Stage

Amount

Sentiment

Pages Viewed

Session Duration (mins)

Deal Closed

L001

85

Negotiation

50,000

Positive

5

10

1

L002

40

Proposal

30,000

Neutral

3

7

0

L003

70

Discovery

10,000

Negative

8

20

0

This approach offers several insights and benefits. It helps identify high-potential deals, allowing resources to be focused on those with a higher likelihood of closure. Proactive risk management is enabled by flagging at-risk deals early, facilitating targeted interventions. Additionally, accurate sales forecasting can be achieved based on predicted closure probabilities, and the refinement of lead scoring systems can enhance accuracy in Salesforce. 

Overall, this use case allows sales and marketing teams to leverage data-driven insights, leading to improved win rates and operational efficiency.

Secure Data Sharing

Let's see how sellers and marketers can monetize their lead attribution dataset along with the proprietary ML model they created. Sales and marketing agencies can offer the lead attribution dataset to other sales platforms and consultants as a subscription with  Databricks Marketplace. Seller organizations can provide anonymized and aggregated data benchmarks (e.g., lead conversion rates by source, average engagement metrics) to clients to compare their performance against industry standards. This can be monetized through one-time reports or recurring subscriptions.

Sales and marketing teams can use Databricks Marketplace to access datasets, AI, and analytical assets like ML models and notebooks without being tied to specific platforms, dealing with complex ETL processes, or incurring high replication costs. This open approach enables faster data utilization across different cloud platforms using preferred tools.

Sellers rely on market research data to expand into new market segments and sales territories. A modern seller experience platform must collaborate with external datasets securely and in compliance with regulations without exposing any first-party data or customer information. Sometimes, a vendor may be unwilling to share market research datasets with sellers unless the environment is secure and ensures data privacy. Vendors may be on different clouds, in different regions, or maybe on different platforms. Databricks Clean Rooms enable businesses to easily collaborate with their customers and partners in a secure environment on any cloud, ensuring privacy. Sellers can securely share and analyze data with partners or other stakeholders without exposing sensitive information, enabling them to gain insights into customer behavior and optimize sales strategies. 

Summary

This blog outlines how Databricks and Salesforce platforms can be combined to power a modern seller experience platform. While Salesforce is still the best platform to interface with the seller, the powerful integration with Databricks can expand the value of the CRM data to help drive intelligent analytics, equip the seller with a holistic set of data points, provide meaningful insights using sales and marketing data, and ultimately empower sellers with AI tools such as sales assist agent, bot applications, and many more. Moreover, since the sales and marketing data is already spread beyond Salesforce and in systems like Google Analytics, Gong, Seismic, it's critical to aggregate this data to construct a comprehensive view for the sellers. Finally, data and AI landscaping are evolving faster than ever, and organizations must adopt a lakehouse architecture to unify, scale, and govern their sales and marketing datasets while avoiding vendor lock-in.

With Generative AI evolving rapidly, an open architecture like Mosaic AI allows organizations to control the cost and ownership of models. Mosaic AI is a unified platform for creating classic machine learning, AI, and Generative AI applications. The Databricks Mosaic AI platform enables teams to build and collaborate on compound AI systems from a single platform with centralized governance and a unified interface for training, tracking, evaluating, swapping, and deploying. Organizations can transition from general intelligence to data intelligence by utilizing enterprise sales and marketing data.

Seller experience platforms can accelerate innovation through strategic collaboration with suppliers, partners, and vendors. Databricks Clean Rooms facilitates this collaborative process by providing a secure environment for sharing data, models, notebooks, and dashboards. It empowers businesses to seamlessly engage in secure collaboration with their customers and partners across any cloud infrastructure, ensuring privacy and data security.