<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Pydantic usage for structured output with provisioned LLM in Generative AI</title>
    <link>https://community.databricks.com/t5/generative-ai/pydantic-usage-for-structured-output-with-provisioned-llm/m-p/126231#M1054</link>
    <description>&lt;P&gt;I think it depends what your overall usecase is.&lt;/P&gt;&lt;UL&gt;&lt;LI&gt;If you're looking to extract text from images / documents specifically using Databricks then you could consider ai_parse which provides a structured extraction of text and OCR content from files:&amp;nbsp;&lt;A href="https://docs.databricks.com/aws/en/sql/language-manual/functions/ai_parse_document" target="_blank"&gt;ai_parse_document function | Databricks Documentation&lt;/A&gt;&lt;/LI&gt;&lt;LI&gt;If you're looking to query an LLM in bulk / batch, you should consider calling Claude with ai_query, which supports structured outputs to a certain degree using the responseFormat argument:&amp;nbsp;&lt;A href="https://docs.databricks.com/aws/en/sql/language-manual/functions/ai_query" target="_blank"&gt;Databricks Documentation&lt;/A&gt;&lt;/LI&gt;&lt;LI&gt;If you're looking to ping an LLM endpoint in a more one-at-a-time way, then you'll need to query the Databricks endpoints somehow.&lt;UL&gt;&lt;LI&gt;The often touted "simplest" way is to use the openai library, and they do have support for structured outputs using tools such as pydantic:&amp;nbsp;&lt;A href="https://openai.com/index/introducing-structured-outputs-in-the-api/" target="_blank"&gt;Introducing Structured Outputs in the API | OpenAI&lt;/A&gt;&lt;/LI&gt;&lt;LI&gt;IMO, I find LangChain a lot easier to work with, but that might just be because I've been using it for like two years or so I've just learned to think about LLMs in a chain-y way, and as I said above it has a Pydantic output parser too:&amp;nbsp;&lt;A href="https://api.python.langchain.com/en/latest/core/output_parsers/langchain_core.output_parsers.pydantic.PydanticOutputParser.html" target="_blank"&gt;PydanticOutputParser — 🦜&lt;span class="lia-unicode-emoji" title=":link:"&gt;🔗&lt;/span&gt; LangChain documentation&lt;/A&gt;&lt;/LI&gt;&lt;/UL&gt;&lt;/LI&gt;&lt;/UL&gt;&lt;P&gt;In the end, I wouldn't say LangChain is an unecasserily heavy framework, and it carries a lot of tools, docs, and examples which can help you upskill quickly. If you really want to keep it as minimal as possible, then use the openai library. However, as said, I'd personally recommend the LangChain links I've given above.&lt;/P&gt;</description>
    <pubDate>Wed, 23 Jul 2025 15:48:26 GMT</pubDate>
    <dc:creator>jAAmes_bentley</dc:creator>
    <dc:date>2025-07-23T15:48:26Z</dc:date>
    <item>
      <title>Pydantic usage for structured output with provisioned LLM</title>
      <link>https://community.databricks.com/t5/generative-ai/pydantic-usage-for-structured-output-with-provisioned-llm/m-p/126068#M1044</link>
      <description>&lt;P&gt;Hi,&lt;/P&gt;&lt;P&gt;I am looking for a resource that has examples of using Pydantic with a provisioned LLM on Databricks to get structured output.&lt;/P&gt;&lt;P&gt;I can find many examples of using Pydantic with LLMs, but not on the Databricks.&lt;/P&gt;&lt;P&gt;My use case is to extract text from images using one of the provisioned LLMs on Databricks into a structured format. For the LLM I would like to use Claude. Any help is greatly appreciated.&lt;/P&gt;&lt;P&gt;Regards,&lt;/P&gt;&lt;P&gt;Frank&lt;/P&gt;</description>
      <pubDate>Tue, 22 Jul 2025 19:27:59 GMT</pubDate>
      <guid>https://community.databricks.com/t5/generative-ai/pydantic-usage-for-structured-output-with-provisioned-llm/m-p/126068#M1044</guid>
      <dc:creator>fcardoze</dc:creator>
      <dc:date>2025-07-22T19:27:59Z</dc:date>
    </item>
    <item>
      <title>Re: Pydantic usage for structured output with provisioned LLM</title>
      <link>https://community.databricks.com/t5/generative-ai/pydantic-usage-for-structured-output-with-provisioned-llm/m-p/126075#M1045</link>
      <description>&lt;P&gt;Are you looking to use models on batch or in a more traditional framework like LangChain? If the latter, you could use the Pydantic Output Parser with databricks-langchain?:&lt;/P&gt;&lt;P&gt;&lt;A href="https://pypi.org/project/databricks-langchain/" target="_blank"&gt;databricks-langchain · PyPI&lt;/A&gt;&lt;BR /&gt;&lt;A href="https://api.python.langchain.com/en/latest/core/output_parsers/langchain_core.output_parsers.pydantic.PydanticOutputParser.html" target="_blank"&gt;PydanticOutputParser — 🦜&lt;span class="lia-unicode-emoji" title=":link:"&gt;🔗&lt;/span&gt; LangChain documentation&lt;/A&gt;&lt;/P&gt;</description>
      <pubDate>Tue, 22 Jul 2025 21:44:38 GMT</pubDate>
      <guid>https://community.databricks.com/t5/generative-ai/pydantic-usage-for-structured-output-with-provisioned-llm/m-p/126075#M1045</guid>
      <dc:creator>jAAmes_bentley</dc:creator>
      <dc:date>2025-07-22T21:44:38Z</dc:date>
    </item>
    <item>
      <title>Re: Pydantic usage for structured output with provisioned LLM</title>
      <link>https://community.databricks.com/t5/generative-ai/pydantic-usage-for-structured-output-with-provisioned-llm/m-p/126207#M1053</link>
      <description>&lt;P&gt;Thanks for the reply. LangChain is a framework that relies on Pydantic per my understanding. I was trying to keep the number of frameworks to a minimum for my use case which is strictly to get a structured output. I am still learning so perhaps I am approaching this the wrong way?&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Wed, 23 Jul 2025 13:50:21 GMT</pubDate>
      <guid>https://community.databricks.com/t5/generative-ai/pydantic-usage-for-structured-output-with-provisioned-llm/m-p/126207#M1053</guid>
      <dc:creator>fcardoze</dc:creator>
      <dc:date>2025-07-23T13:50:21Z</dc:date>
    </item>
    <item>
      <title>Re: Pydantic usage for structured output with provisioned LLM</title>
      <link>https://community.databricks.com/t5/generative-ai/pydantic-usage-for-structured-output-with-provisioned-llm/m-p/126231#M1054</link>
      <description>&lt;P&gt;I think it depends what your overall usecase is.&lt;/P&gt;&lt;UL&gt;&lt;LI&gt;If you're looking to extract text from images / documents specifically using Databricks then you could consider ai_parse which provides a structured extraction of text and OCR content from files:&amp;nbsp;&lt;A href="https://docs.databricks.com/aws/en/sql/language-manual/functions/ai_parse_document" target="_blank"&gt;ai_parse_document function | Databricks Documentation&lt;/A&gt;&lt;/LI&gt;&lt;LI&gt;If you're looking to query an LLM in bulk / batch, you should consider calling Claude with ai_query, which supports structured outputs to a certain degree using the responseFormat argument:&amp;nbsp;&lt;A href="https://docs.databricks.com/aws/en/sql/language-manual/functions/ai_query" target="_blank"&gt;Databricks Documentation&lt;/A&gt;&lt;/LI&gt;&lt;LI&gt;If you're looking to ping an LLM endpoint in a more one-at-a-time way, then you'll need to query the Databricks endpoints somehow.&lt;UL&gt;&lt;LI&gt;The often touted "simplest" way is to use the openai library, and they do have support for structured outputs using tools such as pydantic:&amp;nbsp;&lt;A href="https://openai.com/index/introducing-structured-outputs-in-the-api/" target="_blank"&gt;Introducing Structured Outputs in the API | OpenAI&lt;/A&gt;&lt;/LI&gt;&lt;LI&gt;IMO, I find LangChain a lot easier to work with, but that might just be because I've been using it for like two years or so I've just learned to think about LLMs in a chain-y way, and as I said above it has a Pydantic output parser too:&amp;nbsp;&lt;A href="https://api.python.langchain.com/en/latest/core/output_parsers/langchain_core.output_parsers.pydantic.PydanticOutputParser.html" target="_blank"&gt;PydanticOutputParser — 🦜&lt;span class="lia-unicode-emoji" title=":link:"&gt;🔗&lt;/span&gt; LangChain documentation&lt;/A&gt;&lt;/LI&gt;&lt;/UL&gt;&lt;/LI&gt;&lt;/UL&gt;&lt;P&gt;In the end, I wouldn't say LangChain is an unecasserily heavy framework, and it carries a lot of tools, docs, and examples which can help you upskill quickly. If you really want to keep it as minimal as possible, then use the openai library. However, as said, I'd personally recommend the LangChain links I've given above.&lt;/P&gt;</description>
      <pubDate>Wed, 23 Jul 2025 15:48:26 GMT</pubDate>
      <guid>https://community.databricks.com/t5/generative-ai/pydantic-usage-for-structured-output-with-provisioned-llm/m-p/126231#M1054</guid>
      <dc:creator>jAAmes_bentley</dc:creator>
      <dc:date>2025-07-23T15:48:26Z</dc:date>
    </item>
    <item>
      <title>Re: Pydantic usage for structured output with provisioned LLM</title>
      <link>https://community.databricks.com/t5/generative-ai/pydantic-usage-for-structured-output-with-provisioned-llm/m-p/126234#M1055</link>
      <description>&lt;P&gt;Thank you!&lt;/P&gt;</description>
      <pubDate>Wed, 23 Jul 2025 16:13:13 GMT</pubDate>
      <guid>https://community.databricks.com/t5/generative-ai/pydantic-usage-for-structured-output-with-provisioned-llm/m-p/126234#M1055</guid>
      <dc:creator>fcardoze</dc:creator>
      <dc:date>2025-07-23T16:13:13Z</dc:date>
    </item>
  </channel>
</rss>

