Hey all — looking for confirmation on a behavior I'm hitting on the Foundation Model API (pay-per-token) Anthropic-compatible endpoint, in case anyone else has worked around it.
What I'm doing: serving Claude models through /serving-endpoints/anthropic/v1/messages on the FMAPI pay-per-token tier. AAD bearer auth, U2M flow.
What fails: any request where the messages array ends with a turn of role: "assistant". The endpoint returns:
HTTP 400 BAD_REQUEST
{
"error_code": "BAD_REQUEST",
"message": "This model does not support assistant message prefill. The conversation must end with a user message."
}
Minimal repro shape:
{
"model": "databricks-claude-opus-4-7",
"max_tokens": 256,
"messages": [
{"role": "user", "content": "Complete the sentence:"},
{"role": "assistant", "content": "The capital of France is "}
]
}
Native Anthropic accepts this — it's the documented "assistant prefill" pattern where the model continues from where the partial assistant text leaves off. Common uses: forcing output formats, resuming after interruption, certain tool-loop continuations.
Why this is broader than one client: prefill is foundational in the Anthropic ecosystem. The Anthropic Python/TypeScript SDKs, LangChain's Anthropic provider, autogen and most agent frameworks built on the Anthropic API treat it as a primitive. Anything routed to FMAPI Anthropic that uses prefill gets a 400.
What I'm doing today: running a small proxy in front of FMAPI that strips trailing assistant messages before forwarding. Works for cases where prefill is incidental, but silently degrades any client that actually relies on prefill semantics (output-shaping flows especially).
Questions:
- Is this a known/documented limitation of the FMAPI Anthropic endpoint?
- Is parity with native Anthropic on this feature planned?
- Has anyone found an official workaround other than client-side rewriting?
Thanks!