<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: How to implement prompt caching using Claude models? in Generative AI</title>
    <link>https://community.databricks.com/t5/generative-ai/how-to-implement-prompt-caching-using-claude-models/m-p/131334#M1131</link>
    <description>&lt;P&gt;Hi&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/179612"&gt;@WiliamRosa&lt;/a&gt;&amp;nbsp;&lt;BR /&gt;good day.&lt;/P&gt;&lt;P&gt;We have Azure Databricks customer asked the same question and would like to know if you have roadmap to make this work in&lt;STRONG&gt;&amp;nbsp;Serving Endpoint?&lt;BR /&gt;Customer mentioned, it seems like&amp;nbsp;AWS Bedrock support&amp;nbsp; prompt cache feature...&lt;/STRONG&gt;&lt;/P&gt;</description>
    <pubDate>Tue, 09 Sep 2025 06:42:53 GMT</pubDate>
    <dc:creator>XianCao_98793</dc:creator>
    <dc:date>2025-09-09T06:42:53Z</dc:date>
    <item>
      <title>How to implement prompt caching using Claude models?</title>
      <link>https://community.databricks.com/t5/generative-ai/how-to-implement-prompt-caching-using-claude-models/m-p/129766#M1114</link>
      <description>&lt;P&gt;Hi,&amp;nbsp;&lt;/P&gt;&lt;P&gt;I am trying to use prompt caching future using claude&amp;nbsp;"databricks-claude-sonnet-4" databricks endpoint (wrapped in a ChatDatabricks instance). Using langchain, I set&amp;nbsp;&lt;/P&gt;&lt;LI-CODE lang="markup"&gt; SystemMessage(
                    content=[
                        {
                            "text": cached_doc_prompt,
                            "type": "text",
                            "cache_control": {"type": "ephemeral"},
                        }
                    ]
                ),&lt;/LI-CODE&gt;&lt;P&gt;for the part of the message I want to cache.&lt;/P&gt;&lt;P&gt;I get this error:&lt;/P&gt;&lt;LI-CODE lang="markup"&gt;Response text: {"error_code":"BAD_REQUEST","message":"BAD_REQUEST: Databricks does not support prompt cache for the first-party Anthropic model."}&lt;/LI-CODE&gt;&lt;P&gt;How can prompt caching be achieved in databricks?&amp;nbsp;&lt;/P&gt;&lt;P&gt;thanks for your help!&lt;/P&gt;</description>
      <pubDate>Tue, 26 Aug 2025 08:00:37 GMT</pubDate>
      <guid>https://community.databricks.com/t5/generative-ai/how-to-implement-prompt-caching-using-claude-models/m-p/129766#M1114</guid>
      <dc:creator>DinoSaluzzi</dc:creator>
      <dc:date>2025-08-26T08:00:37Z</dc:date>
    </item>
    <item>
      <title>Re: How to implement prompt caching using Claude models?</title>
      <link>https://community.databricks.com/t5/generative-ai/how-to-implement-prompt-caching-using-claude-models/m-p/129852#M1116</link>
      <description>&lt;P&gt;Hi&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/178191"&gt;@DinoSaluzzi&lt;/a&gt;,&lt;/P&gt;&lt;P&gt;Anthropic’s prompt caching is not supported via Databricks endpoints → the 400 error is expected:&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="WiliamRosa_0-1756240274309.png" style="width: 400px;"&gt;&lt;img src="https://community.databricks.com/t5/image/serverpage/image-id/19394iE7F5CBD80F2E8878/image-size/medium?v=v2&amp;amp;px=400" role="button" title="WiliamRosa_0-1756240274309.png" alt="WiliamRosa_0-1756240274309.png" /&gt;&lt;/span&gt;&lt;/P&gt;&lt;P&gt;To use real prompt caching → call Anthropic’s API directly.&lt;BR /&gt;To stay within Databricks → adopt alternatives such as pseudo-cache, RAG, or context compression.&lt;BR /&gt;&lt;BR /&gt;&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;Overview with RAG:&lt;/STRONG&gt;&lt;BR /&gt;Instead of sending the entire context with every query:&lt;/P&gt;&lt;P&gt;- Ingest the content into Databricks Vector Search (or another vector store).&lt;BR /&gt;- For each user question, retrieve only the most relevant chunks (top-k) and attach them to the prompt.&lt;/P&gt;&lt;P&gt;Pros: dynamically reduces tokens, scales well with large documents.&lt;BR /&gt;Cons: requires an embeddings + retrieval pipeline, plus tuning of chunking/top-k.&lt;/P&gt;&lt;P&gt;High-level skeleton:&lt;/P&gt;&lt;LI-CODE lang="python"&gt;# 1) Index documents (embeddings) → Vector Search / FAISS
# 2) For each question:
query = "User question"
contexts = retriever.get_relevant_documents(query)  # top-k
context_text = "\n\n".join(d.page_content for d in contexts)

msgs = [
    SystemMessage(content="Answer using only the provided context."),
    HumanMessage(content=f"Context:\n{context_text}\n\nQ: {query}")
]
resp = llm.invoke(msgs)&lt;/LI-CODE&gt;</description>
      <pubDate>Tue, 26 Aug 2025 20:33:49 GMT</pubDate>
      <guid>https://community.databricks.com/t5/generative-ai/how-to-implement-prompt-caching-using-claude-models/m-p/129852#M1116</guid>
      <dc:creator>WiliamRosa</dc:creator>
      <dc:date>2025-08-26T20:33:49Z</dc:date>
    </item>
    <item>
      <title>Re: How to implement prompt caching using Claude models?</title>
      <link>https://community.databricks.com/t5/generative-ai/how-to-implement-prompt-caching-using-claude-models/m-p/130179#M1119</link>
      <description>&lt;P&gt;Hi&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/179612"&gt;@WiliamRosa&lt;/a&gt;&amp;nbsp;,&amp;nbsp;&lt;BR /&gt;&lt;BR /&gt;Thanks for your response!&amp;nbsp;&lt;BR /&gt;I shall still need to prompt caching as my usecase asks for it.&amp;nbsp;&lt;BR /&gt;Other databricks endpoints seems to work, like 'databricks-gpt-oss-120b' (using identical logic as you shared in your message). But I could not confirm the actual cache as I can not access token usage for these queries.&lt;BR /&gt;&lt;BR /&gt;Best regards!&lt;/P&gt;</description>
      <pubDate>Fri, 29 Aug 2025 15:47:38 GMT</pubDate>
      <guid>https://community.databricks.com/t5/generative-ai/how-to-implement-prompt-caching-using-claude-models/m-p/130179#M1119</guid>
      <dc:creator>DinoSaluzzi</dc:creator>
      <dc:date>2025-08-29T15:47:38Z</dc:date>
    </item>
    <item>
      <title>Re: How to implement prompt caching using Claude models?</title>
      <link>https://community.databricks.com/t5/generative-ai/how-to-implement-prompt-caching-using-claude-models/m-p/131334#M1131</link>
      <description>&lt;P&gt;Hi&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/179612"&gt;@WiliamRosa&lt;/a&gt;&amp;nbsp;&lt;BR /&gt;good day.&lt;/P&gt;&lt;P&gt;We have Azure Databricks customer asked the same question and would like to know if you have roadmap to make this work in&lt;STRONG&gt;&amp;nbsp;Serving Endpoint?&lt;BR /&gt;Customer mentioned, it seems like&amp;nbsp;AWS Bedrock support&amp;nbsp; prompt cache feature...&lt;/STRONG&gt;&lt;/P&gt;</description>
      <pubDate>Tue, 09 Sep 2025 06:42:53 GMT</pubDate>
      <guid>https://community.databricks.com/t5/generative-ai/how-to-implement-prompt-caching-using-claude-models/m-p/131334#M1131</guid>
      <dc:creator>XianCao_98793</dc:creator>
      <dc:date>2025-09-09T06:42:53Z</dc:date>
    </item>
    <item>
      <title>Re: How to implement prompt caching using Claude models?</title>
      <link>https://community.databricks.com/t5/generative-ai/how-to-implement-prompt-caching-using-claude-models/m-p/139259#M1405</link>
      <description>&lt;P&gt;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/178191"&gt;@DinoSaluzzi&lt;/a&gt;&amp;nbsp;&lt;SPAN&gt;If you can restructure your system&amp;amp; user prompt in a similar manner to the examples provided, prompt caching should start working as expected.&lt;BR /&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;LI-CODE lang="markup"&gt;messages = [
    {
        "role": "system",
        "content": [
            {
                "type": "text",
                "text": "You are a helpful Apache Spark expert. Always provide concise, technical answers.",
                "cache_control": {"type": "ephemeral"}
            }
        ]
    },
    {
        "role": "user",
        "content": "What are the top 3 benefits of using Apache Spark?"
    }
]&lt;/LI-CODE&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;I can confirm that both&amp;nbsp;&lt;/SPAN&gt;&lt;STRONG&gt;cache_read_input_tokens&lt;/STRONG&gt;&lt;SPAN&gt;&amp;nbsp;and&amp;nbsp;&lt;/SPAN&gt;&lt;STRONG&gt;cache_creation_input_tokens&lt;/STRONG&gt;&lt;SPAN&gt;&amp;nbsp;are updating correctly, which indicates that caching is being applied.&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN&gt;&amp;nbsp;Please note that&amp;nbsp;&lt;/SPAN&gt;&lt;STRONG&gt;prompt caching does not activate for smaller prompts&lt;/STRONG&gt;&lt;SPAN&gt;&amp;nbsp;, it is typically triggered only when the prompt size crosses a certain threshold.&lt;/SPAN&gt;&lt;/P&gt;</description>
      <pubDate>Mon, 17 Nov 2025 04:25:55 GMT</pubDate>
      <guid>https://community.databricks.com/t5/generative-ai/how-to-implement-prompt-caching-using-claude-models/m-p/139259#M1405</guid>
      <dc:creator>Pradeep54</dc:creator>
      <dc:date>2025-11-17T04:25:55Z</dc:date>
    </item>
  </channel>
</rss>

