ai_query and cached tokens

samuel86 — Thu, 09 Apr 2026 19:30:11 GMT

Is ai_query actually able to use OpenAI's cached tokens? I was not unable to prove it. The response object from ai_query does not contain the raw response, and when I re-run an identical request via OpenAI SDK (identical model, settings etc.) and examine the response, cached_tokens = 0, which indicates that caching doe snot work in this setup, for whatever reason.

Re: ai_query and cached tokens

anuj_lathi — Fri, 10 Apr 2026 04:31:17 GMT

Great question -- this is a nuanced topic because there are two layers involved: Databricks' proxy layer and OpenAI's caching mechanism.

Short answer: No, ai_query does not currently support OpenAI's prompt caching.

1. ai_query doesn't expose token usage metadata

aiquery is a SQL function that returns only the model's text response -- it does **not** return the full response object including usage.prompttokensdetails.cachedtokens. So even if caching were happening behind the scenes, you'd have no way to verify it from the ai_query output.

2. Databricks Foundation Model APIs act as a proxy

When you call an OpenAI model through Databricks (whether via ai_query, the REST API, or the OpenAI SDK pointed at a Databricks serving endpoint), your request goes through Databricks' infrastructure, not directly to OpenAI.

OpenAI's automatic prompt caching works by:

Routing requests to a specific machine based on a hash of the prompt prefix
Caching prompts with 1024+ tokens
Caching is scoped to the organization making the API call

Since Databricks is the one making the call to OpenAI (not you directly), the caching behavior is governed by how Databricks routes and batches these requests on their infrastructure. The cached_tokens = 0 result confirms that caching is not occurring through this path.

3. What about the OpenAI SDK test?

When you use the OpenAI SDK with identical model and settings but pointed at a Databricks serving endpoint (e.g., baseurl = "https://workspace.databricks.com/serving-endpoints"), you're still going through Databricks' proxy -- not hitting OpenAI directly. That's why cachedtokens = 0.

If you point the OpenAI SDK directly at https://api.openai.com with your own OpenAI API key and repeat the test, you will see caching kick in (assuming 1024+ tokens and the same prompt prefix).

Alternatives

Option A: Call OpenAI directly

If prompt caching savings are significant for your workload, bypass Databricks' Foundation Model APIs and call OpenAI's API directly using a Python UDF or notebook:

import openai

client = openai.OpenAI(api_key="<your-openai-key>") # Direct to OpenAI

response = client.chat.completions.create(

model="gpt-4o",

messages=[{"role": "user", "content": "<your 1024+ token prompt>"}]

)

print(response.usage.prompt_tokens_details.cached_tokens) # Should show cache hits

Option B: Use Databricks-hosted Claude with explicit caching

Databricks does support prompt caching for Claude models via the cache_control parameter in the Foundation Model API:

import requests

response = requests.post(

f"{db_host}/serving-endpoints/databricks-claude-sonnet-4/invocations",

headers={"Authorization": f"Bearer {token}"},

json={

"messages": [{

"role": "user",

"content": [

{"type": "text", "text": "<long context>", "cache_control": {"type": "ephemeral"}},

{"type": "text", "text": "Your question"}

]

}]

}

)

Option C: Use an external model endpoint with AI Gateway

Register your own OpenAI API key as an external model endpoint, which routes calls through Databricks' AI Gateway but directly to OpenAI. This may preserve caching behavior (though it's not guaranteed depending on routing).

Summary

Path	Caching Works?	Why
ai_query via Databricks FMAPI	No	Proxied through Databricks; no usage metadata returned
OpenAI SDK via Databricks endpoint	No	Still proxied through Databricks
OpenAI SDK via api.openai.com directly	Yes	Direct connection, OpenAI handles routing + caching
Databricks FMAPI with Claude models	Yes	Explicit cache_control parameter supported

topic Re: ai_query and cached tokens in Generative AI