Databricks Community

ManojkMohan · 4 weeks ago

Use Case:

I want to expose a data bricks API URL in Salesforce, Salesforce will hit that exposed end point every time a record is created and data will be transferred from Salesforce to Databricks

When i try creating a serving end point

I am unable to see Feature Spec Function

Is this the best way to create a databricks end point ? Looking for suggestions on best practices

mark_ott · 4 weeks ago

When integrating Salesforce with Databricks to push data upon record creation, using a serving endpoint is not the most common or optimal approach. Although Databricks Feature Serving endpoints can be used for model or feature APIs, they are primarily designed for real-time inference or feature retrieval, not as general REST ingestion endpoints for Salesforce-originated data.

The Feature Spec Function you’re missing only appears when using the Databricks Feature Engineering client (databricks-feature-engineering), where you explicitly define a FeatureSpec using Unity Catalog. If you’re not performing real-time feature lookups, you don’t need this setup.

Recommended Integration Approaches

1. Salesforce → Databricks via API Gateway or Middleware

The most reliable approach is to create a proxy API layer between Salesforce and Databricks rather than exposing Databricks directly.
You can:

Create a small Express.js or Flask API that Salesforce calls after a record is created.
The middleware forwards data to Databricks via the Databricks REST API or a Delta Live Table ingestion job.
This approach adds resiliency, logging, and retry mechanisms.

2. Salesforce Data Cloud → Databricks (Zero Copy Integration)

If you use Salesforce Data Cloud, use the new Zero Copy Data Sharing integration:

Allows bi-directional data sync between Salesforce and Databricks.
Eliminates ETL complexity and avoids REST API maintenance.
Supports direct access via Iceberg tables without duplication.

3. Databricks Lakeflow Connect (ETL-based)

Databricks has Lakeflow Connect for ingesting data from Salesforce directly using secure connectors:

Ideal for near-real-time or batch synchronization.
Handles authentication, schema mapping, and incremental updates natively.
Use this if your goal is data movement rather than live transactional events.

4. Event-driven Integration with Salesforce Platform Events

If you must trigger data transfers upon record creation:

Use Salesforce Flow or Apex Trigger to send HTTP POST requests to your middleware API.
Middleware calls Databricks REST API endpoints (Jobs or Model Serving).
Avoid calling Databricks directly from Salesforce to reduce authentication and timeouts issues.

Best Practices

Avoid exposing Databricks endpoints publicly for direct Salesforce calls.
Use service principals and Unity Catalog permissions to secure Databricks endpoints.
For real-time model inference or enrichment, use Feature Serving; for ingesting raw transactional data, use ETL or middleware orchestration.
Validate integration through monitoring, retry logic, and API throttling.

In summary, creating a Databricks serving endpoint is not the best way for general data ingestion from Salesforce. The recommended setup is a middleware or Lakeflow Connect integration, with Feature Serving endpoints reserved for machine learning applications.