<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: FMAPI Anthropic endpoint rejects requests with trailing assistant message — known limitation? in Generative AI</title>
    <link>https://community.databricks.com/t5/generative-ai/fmapi-anthropic-endpoint-rejects-requests-with-trailing/m-p/156711#M1805</link>
    <description>&lt;P&gt;Hi&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/222018"&gt;@cormierjohn&lt;/a&gt;&amp;nbsp;,&lt;/P&gt;
&lt;P&gt;Your understanding is correct. The validation rejecting a trailing assistant turn is happening at the FMAPI proxy layer before the request reaches Claude, so any client that uses Anthropic's prefill primitive will 400 against this endpoint today. Quick pass on your three questions:&lt;/P&gt;
&lt;OL&gt;
&lt;LI&gt;&lt;STRONG&gt;Known limitation?&lt;/STRONG&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;Yes. It isn't called out as a feature gap in the FMAPI docs that I can point to, but the error string is purpose-built rather than incidental, so it's an intentional constraint of the current Anthropic-compatible surface, not a transient bug.&lt;/LI&gt;
&lt;LI&gt;&lt;STRONG&gt;Parity planned?&lt;/STRONG&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;Nothing I can share publicly on roadmap. If you want it tracked, the most reliable path is to file a feature request through your Databricks account team or via support so it lands in the FMAPI team's intake with a customer-attached use case.&lt;/LI&gt;
&lt;LI&gt;&lt;STRONG&gt;Workarounds beyond client-side rewriting?&lt;/STRONG&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;A few that may cover specific use cases:
&lt;UL&gt;
&lt;LI&gt;&lt;STRONG&gt;Reframe prefill as a user instruction.&lt;/STRONG&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;Move the partial assistant text into the final user turn ("Continue from exactly: 'The capital of France is '"). Imperfect, but preserves FMAPI routing for incidental prefill.&lt;/LI&gt;
&lt;LI&gt;&lt;STRONG&gt;Use&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;CODE&gt;stop_sequences&lt;/CODE&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;+ post-processing&lt;/STRONG&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;for output-shaping cases where prefill was only being used to constrain format.&lt;/LI&gt;
&lt;LI&gt;&lt;STRONG&gt;Route prefill-dependent traffic to Anthropic directly&lt;/STRONG&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;for the specific flows that genuinely need prefill semantics (tool-loop continuations, strict structured output), keep the rest on FMAPI for governance/billing. Two-lane is uglier than one, but it's the only path today that preserves prefill behavior exactly.&lt;/LI&gt;
&lt;/UL&gt;
&lt;/LI&gt;
&lt;/OL&gt;
&lt;P&gt;Your stripping proxy is a reasonable bridge for the incidental cases. If you go that route, I'd log every time a trailing assistant turn gets dropped so you can quantify which clients are silently degraded and decide which ones move to the second lane.&lt;/P&gt;</description>
    <pubDate>Tue, 12 May 2026 23:09:33 GMT</pubDate>
    <dc:creator>stbjelcevic</dc:creator>
    <dc:date>2026-05-12T23:09:33Z</dc:date>
    <item>
      <title>FMAPI Anthropic endpoint rejects requests with trailing assistant message — known limitation?</title>
      <link>https://community.databricks.com/t5/generative-ai/fmapi-anthropic-endpoint-rejects-requests-with-trailing/m-p/156535#M1801</link>
      <description>&lt;P&gt;Hey all — looking for confirmation on a behavior I'm hitting on the Foundation Model API (pay-per-token) Anthropic-compatible endpoint, in case anyone else has worked around it.&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;What I'm doing:&lt;/STRONG&gt; serving Claude models through /serving-endpoints/anthropic/v1/messages on the FMAPI pay-per-token tier. AAD bearer auth, U2M flow.&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;What fails:&lt;/STRONG&gt; any request where the messages array ends with a turn of role: "assistant". The endpoint returns:&lt;/P&gt;&lt;LI-CODE lang="markup"&gt;HTTP 400 BAD_REQUEST
{
"error_code": "BAD_REQUEST",
"message": "This model does not support assistant message prefill. The conversation must end with a user message."
}&lt;/LI-CODE&gt;&lt;P&gt;&lt;STRONG&gt;Minimal repro shape:&lt;/STRONG&gt;&lt;/P&gt;&lt;LI-CODE lang="markup"&gt;{
  "model": "databricks-claude-opus-4-7",
  "max_tokens": 256,
  "messages": [
    {"role": "user", "content": "Complete the sentence:"},
    {"role": "assistant", "content": "The capital of France is "}
  ]
}&lt;/LI-CODE&gt;&lt;P&gt;Native Anthropic accepts this — it's the documented "assistant prefill" pattern where the model continues from where the partial assistant text leaves off. Common uses: forcing output formats, resuming after interruption, certain tool-loop continuations.&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;Why this is broader than one client:&lt;/STRONG&gt; prefill is foundational in the Anthropic ecosystem. The Anthropic Python/TypeScript SDKs, LangChain's Anthropic provider, autogen and most agent frameworks built on the Anthropic API treat it as a primitive. Anything routed to FMAPI Anthropic that uses prefill gets a 400.&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;What I'm doing today&lt;/STRONG&gt;: running a small proxy in front of FMAPI that strips trailing assistant messages before forwarding. Works for cases where prefill is incidental, but silently degrades any client that actually relies on prefill semantics (output-shaping flows especially).&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;Questions:&lt;/STRONG&gt;&lt;/P&gt;&lt;OL&gt;&lt;LI&gt;Is this a known/documented limitation of the FMAPI Anthropic endpoint?&lt;/LI&gt;&lt;LI&gt;Is parity with native Anthropic on this feature planned?&lt;/LI&gt;&lt;LI&gt;Has anyone found an official workaround other than client-side rewriting?&lt;/LI&gt;&lt;/OL&gt;&lt;P&gt;&lt;EM&gt;Thanks&lt;U&gt;!&lt;/U&gt;&lt;/EM&gt;&lt;/P&gt;</description>
      <pubDate>Mon, 11 May 2026 08:50:03 GMT</pubDate>
      <guid>https://community.databricks.com/t5/generative-ai/fmapi-anthropic-endpoint-rejects-requests-with-trailing/m-p/156535#M1801</guid>
      <dc:creator>cormierjohn</dc:creator>
      <dc:date>2026-05-11T08:50:03Z</dc:date>
    </item>
    <item>
      <title>Re: FMAPI Anthropic endpoint rejects requests with trailing assistant message — known limitation?</title>
      <link>https://community.databricks.com/t5/generative-ai/fmapi-anthropic-endpoint-rejects-requests-with-trailing/m-p/156711#M1805</link>
      <description>&lt;P&gt;Hi&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/222018"&gt;@cormierjohn&lt;/a&gt;&amp;nbsp;,&lt;/P&gt;
&lt;P&gt;Your understanding is correct. The validation rejecting a trailing assistant turn is happening at the FMAPI proxy layer before the request reaches Claude, so any client that uses Anthropic's prefill primitive will 400 against this endpoint today. Quick pass on your three questions:&lt;/P&gt;
&lt;OL&gt;
&lt;LI&gt;&lt;STRONG&gt;Known limitation?&lt;/STRONG&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;Yes. It isn't called out as a feature gap in the FMAPI docs that I can point to, but the error string is purpose-built rather than incidental, so it's an intentional constraint of the current Anthropic-compatible surface, not a transient bug.&lt;/LI&gt;
&lt;LI&gt;&lt;STRONG&gt;Parity planned?&lt;/STRONG&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;Nothing I can share publicly on roadmap. If you want it tracked, the most reliable path is to file a feature request through your Databricks account team or via support so it lands in the FMAPI team's intake with a customer-attached use case.&lt;/LI&gt;
&lt;LI&gt;&lt;STRONG&gt;Workarounds beyond client-side rewriting?&lt;/STRONG&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;A few that may cover specific use cases:
&lt;UL&gt;
&lt;LI&gt;&lt;STRONG&gt;Reframe prefill as a user instruction.&lt;/STRONG&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;Move the partial assistant text into the final user turn ("Continue from exactly: 'The capital of France is '"). Imperfect, but preserves FMAPI routing for incidental prefill.&lt;/LI&gt;
&lt;LI&gt;&lt;STRONG&gt;Use&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;CODE&gt;stop_sequences&lt;/CODE&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;+ post-processing&lt;/STRONG&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;for output-shaping cases where prefill was only being used to constrain format.&lt;/LI&gt;
&lt;LI&gt;&lt;STRONG&gt;Route prefill-dependent traffic to Anthropic directly&lt;/STRONG&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;for the specific flows that genuinely need prefill semantics (tool-loop continuations, strict structured output), keep the rest on FMAPI for governance/billing. Two-lane is uglier than one, but it's the only path today that preserves prefill behavior exactly.&lt;/LI&gt;
&lt;/UL&gt;
&lt;/LI&gt;
&lt;/OL&gt;
&lt;P&gt;Your stripping proxy is a reasonable bridge for the incidental cases. If you go that route, I'd log every time a trailing assistant turn gets dropped so you can quantify which clients are silently degraded and decide which ones move to the second lane.&lt;/P&gt;</description>
      <pubDate>Tue, 12 May 2026 23:09:33 GMT</pubDate>
      <guid>https://community.databricks.com/t5/generative-ai/fmapi-anthropic-endpoint-rejects-requests-with-trailing/m-p/156711#M1805</guid>
      <dc:creator>stbjelcevic</dc:creator>
      <dc:date>2026-05-12T23:09:33Z</dc:date>
    </item>
  </channel>
</rss>

