Databricks Community

maikel · ‎11-05-2025

Hello Community!

I have a following use case in my project:

User -> AI agent -> MCP Server -> Databricks data from unity catalog.

- AI agent is not created in the databricks
- MCP server is created in the databricks and should expose tools to get data from given unity catalog.

I can see in https://docs.databricks.com/aws/en/generative-ai/mcp/custom-mcp that it should be possible to host MCP in Databricks app however my question is, would it be possible to connect it with the agent outside the databricks? Additionally, shall I implement in my custom mcp server methods to work with databricks data via REST or it might be somehow shortened due to the fact that MCP is in the databricks.

Thanks a lot for the support!
Michal

mark_ott · a month ago

Yes, it is possible to connect an external AI agent to an MCP (Model-Serving Custom Platform) server hosted within Databricks, and there are some benefits and options for working with Databricks data that depend on your architecture choices.

Connecting External AI Agents to Databricks MCP

You can host your custom MCP server within Databricks, and expose its endpoints via the authentication and networking setup that Databricks provides (such as allowing external access via secure endpoints or APIs).
As long as your MCP server is accessible over the network (using public or secure/private endpoints that you configure), an AI agent running outside of Databricks can connect to it by making REST API calls or using SDKs compatible with the MCP interface.

MCP Methods for Unity Catalog Data

Direct Databricks Access: Since your MCP server is hosted inside Databricks, it typically has faster and more privileged access to Databricks-managed resources, like Unity Catalog. You can leverage Databricks-native APIs or even direct Spark jobs within the MCP server's code to access and process Unity Catalog data efficiently.
REST vs. Native Access:
- If you host MCP inside Databricks, your MCP methods can directly interact with Unity Catalog using Databricks SDKs (Python, Scala, SQL) or Spark APIs without the overhead of going through REST APIs.
- If you plan to keep MCP outside Databricks, you would have to use Databricks REST APIs to interact with Unity Catalog, which adds additional network calls and potential complexity.
Recommended: Since MCP is inside Databricks, implement direct access methods using Databricks APIs rather than wrapping REST calls—this approach is more streamlined, efficient, and secure.

Summary Table

Component	Location	Best Access Method	Notes
AI Agent	Outside Databricks	MCP REST API	Connects to MCP over network
MCP Server	Inside Databricks	Direct Databricks/Spark APIs	Native, fast, secure
MCP Server	Outside Databricks	Databricks REST API	Less direct, more overhead
Unity Catalog (Data Layer)	Managed by Databricks	—	—

Key Recommendations

Connect your external AI agent to the MCP server via REST without issues as long as networking and permissions are properly configured.
Since MCP is running in Databricks, use internal Databricks APIs for Unity Catalog access instead of building REST-based data access in your MCP server logic.

This approach will offer you more efficient, secure, and robust access to your data within Databricks while supporting external AI agent connectivity.

maikel · a month ago

Hello Mark!

thanks a lot for the response! By MCP I meant Model Context Protocol Server, not Model-Serving Custom Platform.

I am thinking about those two approaches:

AI agent outside databricks -> MCP server outside databricks -> data in Unity Catalog

The reason for such approach is that I would like to have more control over MCP deployment, resources etc. In this scenario, as you mentioned we should go with REST API between MCP and Unity Catalog data. Could you please advice what is the best option for the authentication through the code (without browser involvement)? Can I create secure endpoint for this? If so, how can I do this?

AI agent outside databricks -> MCP server inside databricks -> data in Unity Catalog

Can you please share more details how can I create secure endpoint in databricks that my agent can authenticate to MCP?
If I have MCP inside databricks does it mean that in my Python code I can just use e.g. Pyspark functions to access it? No authorization required?

Thank you!

Best regards,

Michal

mark_ott · 3 weeks ago

Hopefully this helps...

You can securely connect your external AI agent to a Model Context Protocol (MCP) server and Unity Catalog while maintaining strong control over authentication and resource management. The method depends on whether MCP is outside or inside Databricks. Below are best practices and details for secure endpoint creation and authentication.

MCP Outside Databricks

When MCP is outside Databricks and needs to access Unity Catalog data, REST API calls will be used. Secure, code-based authentication (no browser) is achievable using these methods:

Personal Access Token (PAT) Authentication:
You can generate a Databricks PAT and include it in your REST API request headers using an Authorization token. This is suitable for automation and code-only flows, without browser involvement.
- Generate PAT in Databricks: Go to User Settings > Access Tokens.
- Use header:
  
  text
  
  Authorization: Bearer <token>
- Store the PAT securely (environment variable, secret manager).
Service Principal (Workspace-Managed Identity):
For production and enterprise setups, use a service principal registered with Databricks.
- Authenticate via client credentials (client ID/secret or certificate) using OAuth2 flows from your MCP or agent’s code.
- Obtain workspace and catalog access via API, using the principal’s scopes and roles.
Securing the Endpoint:
- Host your MCP server on a secure cloud VM or service (ensure HTTPS/TLS).
- Require authentication for access and provide only secure REST API endpoints.
- Store credentials (tokens/secrets) outside codebase—preferably with a secret management service.

Secure Endpoints in Databricks

When deploying MCP inside Databricks, you can use Databricks native security and authentication mechanisms:

Databricks REST Endpoints:
- You can create WebHDFS or standard REST endpoints in Databricks, protected by workspace authentication.
- Set up an endpoint using Databricks Jobs, Delta Live Tables, or MLflow model serving.
- Secure using Bearer tokens (PATs) or service principals, as above.
Unity Catalog Access Control:
- Assign workspace, schema, and table permissions to users, groups, or service principals in the Unity Catalog.
- Only entities with appropriate permissions (via access control lists/policies) can query data.

MCP Inside Databricks: Accessing via Python

Native Access to Unity Catalog:
- If MCP is running inside a Databricks workspace (e.g., as a notebook, job, or managed MLflow endpoint), it can access Unity Catalog directly using PySpark, Databricks SQL, or REST API.
- Authorization is seamless if the code runs under a user/service principal with catalog permissions.
- Best practice: assign least-privileged roles, and audit usage.
No Explicit Authorization Required:
- When running inside Databricks with correct role/ID, explicit extra authentication steps are not needed—Databricks manages session tokens.
- Access is managed by Databricks’ authentication context, so spark.read.table("catalog.schema.table") will work if permissions are set.

Summary Table: Authentication Approaches

Scenario	Endpoint Security	Authentication in Code	Browser needed?	Notes
AI agent -> MCP outside Databricks -> Unity Catalog	HTTPS REST API	PAT / Service Principal	No	Secure with token header or OAuth2
MCP Endpoint inside Databricks	Databricks REST/MLflow	PAT / Service Principal	No	Native workspace authentication
MCP via PySpark inside Databricks	Databricks runtime	Workspace session	No	Managed by workspace/session context

Key Recommendations

Always use HTTPS for all endpoints.
Prefer service principals and managed identities for scalable, secure automation.
Store tokens and secrets securely, using environment variables or cloud secret managers.
Set fine-grained Unity Catalog permissions to limit data access to only what’s needed.
For MCP inside Databricks, leverage PySpark/DataFrame APIs for direct access, with minimal authentication setup required.

If you need code samples or steps for endpoint setup or OAuth2 authorization, please specify which platform (Azure, AWS, GCP) and Databricks environment you’re using.

For technical implementation details, including authorization flows and secure endpoint creation inside/outside Databricks, review the official Databricks documentation and your cloud provider's security guidelines.

maikel · a week ago

Hello @mark_ott

thank you very much for this! It gives me a lot of knowledge! I think that since data is stored in databricks we will go with MCP deployed there as well.

I have next portion of questions - can you please tell me how to deploy my mcp server in databricks? I am not sure if it should be my custom or maybe more managed MCP server. Scenario which I would like to cover is as following:

I have data in the unity catalog and user has permissions to the data
I see that there is something like Managed MCP which I can imagine are MCP created for the given data (e.g. Unity Catalog) by you which already has all methods to work with the data, correct? If so, how can I get URL to this MCP that I can use it in my agent which is outside databricks?
If I should go with custom MCP how can I deploy it? I would like to use python fast api and I assume I should define MCP tools etc.

I will be very thankful for the response once again!
Michal