<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: how to speed up inference? in Machine Learning</title>
    <link>https://community.databricks.com/t5/machine-learning/how-to-speed-up-inference/m-p/135959#M4362</link>
    <description>&lt;P class="my-2 [&amp;amp;+p]:mt-4 [&amp;amp;_strong:has(+br)]:inline-block [&amp;amp;_strong:has(+br)]:pb-2"&gt;In Databricks, the most efficient way to handle multiple machine learning models for inference — especially when each model has its own inference logic — is to use&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;STRONG&gt;batch inference with Spark DataFrames and Pandas UDFs&lt;/STRONG&gt;. Instead of looping over your models sequentially in Python, you can parallelize inference across your data and model configurations using Spark’s distributed capabilities.&lt;/P&gt;
&lt;H2 class="mb-2 mt-4 font-display font-semimedium text-base first:mt-0"&gt;Batch Inference with Spark DataFrames&lt;/H2&gt;
&lt;P class="my-2 [&amp;amp;+p]:mt-4 [&amp;amp;_strong:has(+br)]:inline-block [&amp;amp;_strong:has(+br)]:pb-2"&gt;Databricks recommends structuring your data in a&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;STRONG&gt;Spark DataFrame&lt;/STRONG&gt;, where each row represents an item for prediction and may include metadata indicating which model to use. The workflow typically includes:&lt;/P&gt;
&lt;OL class="marker:text-quiet list-decimal"&gt;
&lt;LI class="py-0 my-0 prose-p:pt-0 prose-p:mb-2 prose-p:my-0 [&amp;amp;&amp;gt;p]:pt-0 [&amp;amp;&amp;gt;p]:mb-2 [&amp;amp;&amp;gt;p]:my-0"&gt;
&lt;P class="my-2 [&amp;amp;+p]:mt-4 [&amp;amp;_strong:has(+br)]:inline-block [&amp;amp;_strong:has(+br)]:pb-2"&gt;&lt;STRONG&gt;Loading your data&lt;/STRONG&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;into a Spark DataFrame (from Unity Catalog, Delta tables, or external sources).&lt;/P&gt;
&lt;/LI&gt;
&lt;LI class="py-0 my-0 prose-p:pt-0 prose-p:mb-2 prose-p:my-0 [&amp;amp;&amp;gt;p]:pt-0 [&amp;amp;&amp;gt;p]:mb-2 [&amp;amp;&amp;gt;p]:my-0"&gt;
&lt;P class="my-2 [&amp;amp;+p]:mt-4 [&amp;amp;_strong:has(+br)]:inline-block [&amp;amp;_strong:has(+br)]:pb-2"&gt;&lt;STRONG&gt;Loading your models&lt;/STRONG&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;from the MLflow Model Registry.&lt;/P&gt;
&lt;/LI&gt;
&lt;LI class="py-0 my-0 prose-p:pt-0 prose-p:mb-2 prose-p:my-0 [&amp;amp;&amp;gt;p]:pt-0 [&amp;amp;&amp;gt;p]:mb-2 [&amp;amp;&amp;gt;p]:my-0"&gt;
&lt;P class="my-2 [&amp;amp;+p]:mt-4 [&amp;amp;_strong:has(+br)]:inline-block [&amp;amp;_strong:has(+br)]:pb-2"&gt;&lt;STRONG&gt;Creating spark UDFs&lt;/STRONG&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;for inference using&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;CODE&gt;mlflow.pyfunc.spark_udf()&lt;/CODE&gt;.&lt;/P&gt;
&lt;/LI&gt;
&lt;LI class="py-0 my-0 prose-p:pt-0 prose-p:mb-2 prose-p:my-0 [&amp;amp;&amp;gt;p]:pt-0 [&amp;amp;&amp;gt;p]:mb-2 [&amp;amp;&amp;gt;p]:my-0"&gt;
&lt;P class="my-2 [&amp;amp;+p]:mt-4 [&amp;amp;_strong:has(+br)]:inline-block [&amp;amp;_strong:has(+br)]:pb-2"&gt;&lt;STRONG&gt;Applying the UDFs&lt;/STRONG&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;to your DataFrame to generate predictions in bulk.​&lt;/P&gt;
&lt;/LI&gt;
&lt;/OL&gt;
&lt;P class="my-2 [&amp;amp;+p]:mt-4 [&amp;amp;_strong:has(+br)]:inline-block [&amp;amp;_strong:has(+br)]:pb-2"&gt;Example:&lt;/P&gt;
&lt;DIV class="w-full md:max-w-[90vw]"&gt;
&lt;DIV class="codeWrapper text-light selection:text-super selection:bg-super/10 my-md relative flex flex-col rounded font-mono text-sm font-normal bg-subtler"&gt;
&lt;DIV class="translate-y-xs -translate-x-xs bottom-xl mb-xl flex h-0 items-start justify-end md:sticky md:top-[100px]"&gt;
&lt;DIV class="overflow-hidden rounded-full border-subtlest ring-subtlest divide-subtlest bg-base"&gt;
&lt;DIV class="border-subtlest ring-subtlest divide-subtlest bg-subtler"&gt;&amp;nbsp;&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;DIV class="-mt-xl"&gt;
&lt;DIV&gt;
&lt;DIV class="text-quiet bg-subtle py-xs px-sm inline-block rounded-br rounded-tl-[3px] font-thin" data-testid="code-language-indicator"&gt;python&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;DIV&gt;&lt;SPAN&gt;&lt;CODE&gt;&lt;SPAN class="token token"&gt;import&lt;/SPAN&gt; mlflow
&lt;SPAN class="token token"&gt;from&lt;/SPAN&gt; pyspark&lt;SPAN class="token token punctuation"&gt;.&lt;/SPAN&gt;sql &lt;SPAN class="token token"&gt;import&lt;/SPAN&gt; functions &lt;SPAN class="token token"&gt;as&lt;/SPAN&gt; F

predict_udf &lt;SPAN class="token token operator"&gt;=&lt;/SPAN&gt; mlflow&lt;SPAN class="token token punctuation"&gt;.&lt;/SPAN&gt;pyfunc&lt;SPAN class="token token punctuation"&gt;.&lt;/SPAN&gt;spark_udf&lt;SPAN class="token token punctuation"&gt;(&lt;/SPAN&gt;spark&lt;SPAN class="token token punctuation"&gt;,&lt;/SPAN&gt; model_uri&lt;SPAN class="token token operator"&gt;=&lt;/SPAN&gt;&lt;SPAN class="token token"&gt;"models:/my_model/Production"&lt;/SPAN&gt;&lt;SPAN class="token token punctuation"&gt;)&lt;/SPAN&gt;
df &lt;SPAN class="token token operator"&gt;=&lt;/SPAN&gt; df&lt;SPAN class="token token punctuation"&gt;.&lt;/SPAN&gt;withColumn&lt;SPAN class="token token punctuation"&gt;(&lt;/SPAN&gt;&lt;SPAN class="token token"&gt;"prediction"&lt;/SPAN&gt;&lt;SPAN class="token token punctuation"&gt;,&lt;/SPAN&gt; predict_udf&lt;SPAN class="token token punctuation"&gt;(&lt;/SPAN&gt;&lt;SPAN class="token token operator"&gt;*&lt;/SPAN&gt;df&lt;SPAN class="token token punctuation"&gt;.&lt;/SPAN&gt;columns&lt;SPAN class="token token punctuation"&gt;)&lt;/SPAN&gt;&lt;SPAN class="token token punctuation"&gt;)&lt;/SPAN&gt;
df&lt;SPAN class="token token punctuation"&gt;.&lt;/SPAN&gt;write&lt;SPAN class="token token punctuation"&gt;.&lt;/SPAN&gt;mode&lt;SPAN class="token token punctuation"&gt;(&lt;/SPAN&gt;&lt;SPAN class="token token"&gt;"overwrite"&lt;/SPAN&gt;&lt;SPAN class="token token punctuation"&gt;)&lt;/SPAN&gt;&lt;SPAN class="token token punctuation"&gt;.&lt;/SPAN&gt;saveAsTable&lt;SPAN class="token token punctuation"&gt;(&lt;/SPAN&gt;&lt;SPAN class="token token"&gt;"predictions_output"&lt;/SPAN&gt;&lt;SPAN class="token token punctuation"&gt;)&lt;/SPAN&gt;&lt;/CODE&gt;&lt;/SPAN&gt;&lt;/DIV&gt;
&lt;DIV&gt;&amp;nbsp;&lt;/DIV&gt;
&lt;DIV&gt;
&lt;P class="my-2 [&amp;amp;+p]:mt-4 [&amp;amp;_strong:has(+br)]:inline-block [&amp;amp;_strong:has(+br)]:pb-2"&gt;This approach allows Spark to distribute inference tasks across multiple executors, avoiding Python’s sequential bottlenecks.&lt;/P&gt;
&lt;H2 class="mb-2 mt-4 font-display font-semimedium text-base first:mt-0"&gt;Parallel Multi-Model Inference&lt;/H2&gt;
&lt;P class="my-2 [&amp;amp;+p]:mt-4 [&amp;amp;_strong:has(+br)]:inline-block [&amp;amp;_strong:has(+br)]:pb-2"&gt;When dealing with multiple models (e.g., per client or product), Databricks supports&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;STRONG&gt;parallel batch inference&lt;/STRONG&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;using the&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;CODE&gt;groupBy.applyInPandas()&lt;/CODE&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;method combined with&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;STRONG&gt;Pandas UDFs&lt;/STRONG&gt;. Each Spark worker can handle inference for a different model, allowing you to:&lt;/P&gt;
&lt;UL class="marker:text-quiet list-disc"&gt;
&lt;LI class="py-0 my-0 prose-p:pt-0 prose-p:mb-2 prose-p:my-0 [&amp;amp;&amp;gt;p]:pt-0 [&amp;amp;&amp;gt;p]:mb-2 [&amp;amp;&amp;gt;p]:my-0"&gt;
&lt;P class="my-2 [&amp;amp;+p]:mt-4 [&amp;amp;_strong:has(+br)]:inline-block [&amp;amp;_strong:has(+br)]:pb-2"&gt;Load each model once per worker process.&lt;/P&gt;
&lt;/LI&gt;
&lt;LI class="py-0 my-0 prose-p:pt-0 prose-p:mb-2 prose-p:my-0 [&amp;amp;&amp;gt;p]:pt-0 [&amp;amp;&amp;gt;p]:mb-2 [&amp;amp;&amp;gt;p]:my-0"&gt;
&lt;P class="my-2 [&amp;amp;+p]:mt-4 [&amp;amp;_strong:has(+br)]:inline-block [&amp;amp;_strong:has(+br)]:pb-2"&gt;Process subsets of data in parallel.​&lt;/P&gt;
&lt;/LI&gt;
&lt;/UL&gt;
&lt;P class="my-2 [&amp;amp;+p]:mt-4 [&amp;amp;_strong:has(+br)]:inline-block [&amp;amp;_strong:has(+br)]:pb-2"&gt;Example pattern:&lt;/P&gt;
&lt;DIV class="w-full md:max-w-[90vw]"&gt;
&lt;DIV class="codeWrapper text-light selection:text-super selection:bg-super/10 my-md relative flex flex-col rounded font-mono text-sm font-normal bg-subtler"&gt;
&lt;DIV class="translate-y-xs -translate-x-xs bottom-xl mb-xl flex h-0 items-start justify-end md:sticky md:top-[100px]"&gt;
&lt;DIV class="overflow-hidden rounded-full border-subtlest ring-subtlest divide-subtlest bg-base"&gt;
&lt;DIV class="border-subtlest ring-subtlest divide-subtlest bg-subtler"&gt;&amp;nbsp;&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;DIV class="-mt-xl"&gt;
&lt;DIV&gt;
&lt;DIV class="text-quiet bg-subtle py-xs px-sm inline-block rounded-br rounded-tl-[3px] font-thin" data-testid="code-language-indicator"&gt;python&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;DIV&gt;&lt;SPAN&gt;&lt;CODE&gt;&lt;SPAN class="token token"&gt;def&lt;/SPAN&gt; &lt;SPAN class="token token"&gt;run_inference&lt;/SPAN&gt;&lt;SPAN class="token token punctuation"&gt;(&lt;/SPAN&gt;pdf&lt;SPAN class="token token punctuation"&gt;)&lt;/SPAN&gt;&lt;SPAN class="token token punctuation"&gt;:&lt;/SPAN&gt;
    model_path &lt;SPAN class="token token operator"&gt;=&lt;/SPAN&gt; pdf&lt;SPAN class="token token punctuation"&gt;[&lt;/SPAN&gt;&lt;SPAN class="token token"&gt;'model_path'&lt;/SPAN&gt;&lt;SPAN class="token token punctuation"&gt;]&lt;/SPAN&gt;&lt;SPAN class="token token punctuation"&gt;.&lt;/SPAN&gt;iloc&lt;SPAN class="token token punctuation"&gt;[&lt;/SPAN&gt;&lt;SPAN class="token token"&gt;0&lt;/SPAN&gt;&lt;SPAN class="token token punctuation"&gt;]&lt;/SPAN&gt;
    model &lt;SPAN class="token token operator"&gt;=&lt;/SPAN&gt; mlflow&lt;SPAN class="token token punctuation"&gt;.&lt;/SPAN&gt;pyfunc&lt;SPAN class="token token punctuation"&gt;.&lt;/SPAN&gt;load_model&lt;SPAN class="token token punctuation"&gt;(&lt;/SPAN&gt;model_path&lt;SPAN class="token token punctuation"&gt;)&lt;/SPAN&gt;
    pdf&lt;SPAN class="token token punctuation"&gt;[&lt;/SPAN&gt;&lt;SPAN class="token token"&gt;'prediction'&lt;/SPAN&gt;&lt;SPAN class="token token punctuation"&gt;]&lt;/SPAN&gt; &lt;SPAN class="token token operator"&gt;=&lt;/SPAN&gt; model&lt;SPAN class="token token punctuation"&gt;.&lt;/SPAN&gt;predict&lt;SPAN class="token token punctuation"&gt;(&lt;/SPAN&gt;pdf&lt;SPAN class="token token punctuation"&gt;[&lt;/SPAN&gt;&lt;SPAN class="token token"&gt;'features'&lt;/SPAN&gt;&lt;SPAN class="token token punctuation"&gt;]&lt;/SPAN&gt;&lt;SPAN class="token token punctuation"&gt;)&lt;/SPAN&gt;
    &lt;SPAN class="token token"&gt;return&lt;/SPAN&gt; pdf

result_df &lt;SPAN class="token token operator"&gt;=&lt;/SPAN&gt; df&lt;SPAN class="token token punctuation"&gt;.&lt;/SPAN&gt;groupBy&lt;SPAN class="token token punctuation"&gt;(&lt;/SPAN&gt;&lt;SPAN class="token token"&gt;"model_id"&lt;/SPAN&gt;&lt;SPAN class="token token punctuation"&gt;)&lt;/SPAN&gt;&lt;SPAN class="token token punctuation"&gt;.&lt;/SPAN&gt;applyInPandas&lt;SPAN class="token token punctuation"&gt;(&lt;/SPAN&gt;run_inference&lt;SPAN class="token token punctuation"&gt;,&lt;/SPAN&gt; schema&lt;SPAN class="token token operator"&gt;=&lt;/SPAN&gt;df&lt;SPAN class="token token punctuation"&gt;.&lt;/SPAN&gt;schema&lt;SPAN class="token token punctuation"&gt;)&lt;/SPAN&gt;
&lt;/CODE&gt;&lt;/SPAN&gt;&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;P class="my-2 [&amp;amp;+p]:mt-4 [&amp;amp;_strong:has(+br)]:inline-block [&amp;amp;_strong:has(+br)]:pb-2"&gt;This design reduces redundant model loading and uses Spark’s distributed compute layer efficiently.&lt;/P&gt;
&lt;H2 class="mb-2 mt-4 font-display font-semimedium text-base first:mt-0"&gt;Mosaic AI &amp;amp; AI Functions&lt;/H2&gt;
&lt;P class="my-2 [&amp;amp;+p]:mt-4 [&amp;amp;_strong:has(+br)]:inline-block [&amp;amp;_strong:has(+br)]:pb-2"&gt;If your inference needs involve standard ML or LLM models, you can simplify further with&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;STRONG&gt;AI Functions&lt;/STRONG&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;or&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;STRONG&gt;Mosaic AI batch inference&lt;/STRONG&gt;. These let you run model inference directly via SQL using functions like&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;CODE&gt;ai_query()&lt;/CODE&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;without manual looping or building pipelines.&lt;/P&gt;
&lt;P class="my-2 [&amp;amp;+p]:mt-4 [&amp;amp;_strong:has(+br)]:inline-block [&amp;amp;_strong:has(+br)]:pb-2"&gt;Example SQL:&lt;/P&gt;
&lt;DIV class="w-full md:max-w-[90vw]"&gt;
&lt;DIV class="codeWrapper text-light selection:text-super selection:bg-super/10 my-md relative flex flex-col rounded font-mono text-sm font-normal bg-subtler"&gt;
&lt;DIV class="translate-y-xs -translate-x-xs bottom-xl mb-xl flex h-0 items-start justify-end md:sticky md:top-[100px]"&gt;
&lt;DIV class="overflow-hidden rounded-full border-subtlest ring-subtlest divide-subtlest bg-base"&gt;
&lt;DIV class="border-subtlest ring-subtlest divide-subtlest bg-subtler"&gt;&amp;nbsp;&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;DIV class="-mt-xl"&gt;
&lt;DIV&gt;
&lt;DIV class="text-quiet bg-subtle py-xs px-sm inline-block rounded-br rounded-tl-[3px] font-thin" data-testid="code-language-indicator"&gt;sql&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;DIV&gt;&lt;SPAN&gt;&lt;CODE&gt;&lt;SPAN class="token token"&gt;SELECT&lt;/SPAN&gt; input_text&lt;SPAN class="token token punctuation"&gt;,&lt;/SPAN&gt; ai_query&lt;SPAN class="token token punctuation"&gt;(&lt;/SPAN&gt;&lt;SPAN class="token token"&gt;'my_registered_model'&lt;/SPAN&gt;&lt;SPAN class="token token punctuation"&gt;,&lt;/SPAN&gt; input_text&lt;SPAN class="token token punctuation"&gt;)&lt;/SPAN&gt; &lt;SPAN class="token token"&gt;AS&lt;/SPAN&gt; prediction
&lt;SPAN class="token token"&gt;FROM&lt;/SPAN&gt; my_input_table
&lt;/CODE&gt;&lt;/SPAN&gt;&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;H2 class="mb-2 mt-4 font-display font-semimedium text-base first:mt-0"&gt;Summary of Best Practices&lt;/H2&gt;
&lt;DIV class="group relative"&gt;
&lt;DIV class="w-full overflow-x-auto md:max-w-[90vw] border-subtlest ring-subtlest divide-subtlest bg-transparent"&gt;
&lt;TABLE class="border-subtler my-[1em] w-full table-auto border-separate border-spacing-0 border-l border-t"&gt;
&lt;THEAD class="bg-subtler"&gt;
&lt;TR&gt;
&lt;TH class="border-subtler p-sm break-normal border-b border-r text-left align-top"&gt;Technique&lt;/TH&gt;
&lt;TH class="border-subtler p-sm break-normal border-b border-r text-left align-top"&gt;Use Case&lt;/TH&gt;
&lt;TH class="border-subtler p-sm break-normal border-b border-r text-left align-top"&gt;Advantage&lt;/TH&gt;
&lt;/TR&gt;
&lt;/THEAD&gt;
&lt;TBODY&gt;
&lt;TR&gt;
&lt;TD class="px-sm border-subtler min-w-[48px] break-normal border-b border-r"&gt;Spark UDFs (&lt;CODE&gt;mlflow.pyfunc.spark_udf&lt;/CODE&gt;)&lt;/TD&gt;
&lt;TD class="px-sm border-subtler min-w-[48px] break-normal border-b border-r"&gt;Standard ML models&lt;/TD&gt;
&lt;TD class="px-sm border-subtler min-w-[48px] break-normal border-b border-r"&gt;Simplifies batch scoring&lt;/TD&gt;
&lt;/TR&gt;
&lt;TR&gt;
&lt;TD class="px-sm border-subtler min-w-[48px] break-normal border-b border-r"&gt;Pandas UDFs with&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;CODE&gt;groupBy.applyInPandas&lt;/CODE&gt;&lt;/TD&gt;
&lt;TD class="px-sm border-subtler min-w-[48px] break-normal border-b border-r"&gt;Many models (per group)&lt;/TD&gt;
&lt;TD class="px-sm border-subtler min-w-[48px] break-normal border-b border-r"&gt;Parallel per-model inference&lt;/TD&gt;
&lt;/TR&gt;
&lt;TR&gt;
&lt;TD class="px-sm border-subtler min-w-[48px] break-normal border-b border-r"&gt;Mosaic AI / AI Functions&lt;/TD&gt;
&lt;TD class="px-sm border-subtler min-w-[48px] break-normal border-b border-r"&gt;LLMs or unified inference&lt;/TD&gt;
&lt;TD class="px-sm border-subtler min-w-[48px] break-normal border-b border-r"&gt;Simplified SQL-based scaling&lt;/TD&gt;
&lt;/TR&gt;
&lt;TR&gt;
&lt;TD class="px-sm border-subtler min-w-[48px] break-normal border-b border-r"&gt;Delta Live Tables (DLT)&lt;/TD&gt;
&lt;TD class="px-sm border-subtler min-w-[48px] break-normal border-b border-r"&gt;Scheduled, repeatable jobs&lt;/TD&gt;
&lt;TD class="px-sm border-subtler min-w-[48px] break-normal border-b border-r"&gt;Automates production batch runs&lt;/TD&gt;
&lt;/TR&gt;
&lt;/TBODY&gt;
&lt;/TABLE&gt;
&lt;/DIV&gt;
&lt;DIV class="bg-base border-subtler shadow-subtle pointer-coarse:opacity-100 right-xs absolute bottom-0 flex rounded-lg border opacity-0 transition-opacity group-hover:opacity-100 [&amp;amp;&amp;gt;*:not(:first-child)]:border-subtle [&amp;amp;&amp;gt;*:not(:first-child)]:border-l"&gt;
&lt;DIV class="flex"&gt;&amp;nbsp;&lt;/DIV&gt;
&lt;DIV class="flex"&gt;&amp;nbsp;&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;P class="my-2 [&amp;amp;+p]:mt-4 [&amp;amp;_strong:has(+br)]:inline-block [&amp;amp;_strong:has(+br)]:pb-2"&gt;In short, replace your Python loops with&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;STRONG&gt;Spark-level Pandas UDFs or Databricks batch inference functions&lt;/STRONG&gt;. This takes advantage of cluster parallelism and avoids sequential execution, allowing all your model inferences to run efficiently in parallel across nodes.&lt;/P&gt;
&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;/DIV&gt;</description>
    <pubDate>Fri, 24 Oct 2025 14:05:42 GMT</pubDate>
    <dc:creator>mark_ott</dc:creator>
    <dc:date>2025-10-24T14:05:42Z</dc:date>
    <item>
      <title>how to speed up inference?</title>
      <link>https://community.databricks.com/t5/machine-learning/how-to-speed-up-inference/m-p/135900#M4360</link>
      <description>&lt;P&gt;Hi guys,&lt;/P&gt;&lt;P&gt;I'm new to this concept, but we have several ML models that follow the same structure from the code. What I don’t fully understand is how to handle different types of models efficiently — right now, I need to loop through my items to get the inference for each model, and each one requires a specific inference process.&lt;BR /&gt;How can I speed this up using some kind of batch inference step?&lt;/P&gt;</description>
      <pubDate>Thu, 23 Oct 2025 21:09:19 GMT</pubDate>
      <guid>https://community.databricks.com/t5/machine-learning/how-to-speed-up-inference/m-p/135900#M4360</guid>
      <dc:creator>jeremy98</dc:creator>
      <dc:date>2025-10-23T21:09:19Z</dc:date>
    </item>
    <item>
      <title>Re: how to speed up inference?</title>
      <link>https://community.databricks.com/t5/machine-learning/how-to-speed-up-inference/m-p/135959#M4362</link>
      <description>&lt;P class="my-2 [&amp;amp;+p]:mt-4 [&amp;amp;_strong:has(+br)]:inline-block [&amp;amp;_strong:has(+br)]:pb-2"&gt;In Databricks, the most efficient way to handle multiple machine learning models for inference — especially when each model has its own inference logic — is to use&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;STRONG&gt;batch inference with Spark DataFrames and Pandas UDFs&lt;/STRONG&gt;. Instead of looping over your models sequentially in Python, you can parallelize inference across your data and model configurations using Spark’s distributed capabilities.&lt;/P&gt;
&lt;H2 class="mb-2 mt-4 font-display font-semimedium text-base first:mt-0"&gt;Batch Inference with Spark DataFrames&lt;/H2&gt;
&lt;P class="my-2 [&amp;amp;+p]:mt-4 [&amp;amp;_strong:has(+br)]:inline-block [&amp;amp;_strong:has(+br)]:pb-2"&gt;Databricks recommends structuring your data in a&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;STRONG&gt;Spark DataFrame&lt;/STRONG&gt;, where each row represents an item for prediction and may include metadata indicating which model to use. The workflow typically includes:&lt;/P&gt;
&lt;OL class="marker:text-quiet list-decimal"&gt;
&lt;LI class="py-0 my-0 prose-p:pt-0 prose-p:mb-2 prose-p:my-0 [&amp;amp;&amp;gt;p]:pt-0 [&amp;amp;&amp;gt;p]:mb-2 [&amp;amp;&amp;gt;p]:my-0"&gt;
&lt;P class="my-2 [&amp;amp;+p]:mt-4 [&amp;amp;_strong:has(+br)]:inline-block [&amp;amp;_strong:has(+br)]:pb-2"&gt;&lt;STRONG&gt;Loading your data&lt;/STRONG&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;into a Spark DataFrame (from Unity Catalog, Delta tables, or external sources).&lt;/P&gt;
&lt;/LI&gt;
&lt;LI class="py-0 my-0 prose-p:pt-0 prose-p:mb-2 prose-p:my-0 [&amp;amp;&amp;gt;p]:pt-0 [&amp;amp;&amp;gt;p]:mb-2 [&amp;amp;&amp;gt;p]:my-0"&gt;
&lt;P class="my-2 [&amp;amp;+p]:mt-4 [&amp;amp;_strong:has(+br)]:inline-block [&amp;amp;_strong:has(+br)]:pb-2"&gt;&lt;STRONG&gt;Loading your models&lt;/STRONG&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;from the MLflow Model Registry.&lt;/P&gt;
&lt;/LI&gt;
&lt;LI class="py-0 my-0 prose-p:pt-0 prose-p:mb-2 prose-p:my-0 [&amp;amp;&amp;gt;p]:pt-0 [&amp;amp;&amp;gt;p]:mb-2 [&amp;amp;&amp;gt;p]:my-0"&gt;
&lt;P class="my-2 [&amp;amp;+p]:mt-4 [&amp;amp;_strong:has(+br)]:inline-block [&amp;amp;_strong:has(+br)]:pb-2"&gt;&lt;STRONG&gt;Creating spark UDFs&lt;/STRONG&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;for inference using&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;CODE&gt;mlflow.pyfunc.spark_udf()&lt;/CODE&gt;.&lt;/P&gt;
&lt;/LI&gt;
&lt;LI class="py-0 my-0 prose-p:pt-0 prose-p:mb-2 prose-p:my-0 [&amp;amp;&amp;gt;p]:pt-0 [&amp;amp;&amp;gt;p]:mb-2 [&amp;amp;&amp;gt;p]:my-0"&gt;
&lt;P class="my-2 [&amp;amp;+p]:mt-4 [&amp;amp;_strong:has(+br)]:inline-block [&amp;amp;_strong:has(+br)]:pb-2"&gt;&lt;STRONG&gt;Applying the UDFs&lt;/STRONG&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;to your DataFrame to generate predictions in bulk.​&lt;/P&gt;
&lt;/LI&gt;
&lt;/OL&gt;
&lt;P class="my-2 [&amp;amp;+p]:mt-4 [&amp;amp;_strong:has(+br)]:inline-block [&amp;amp;_strong:has(+br)]:pb-2"&gt;Example:&lt;/P&gt;
&lt;DIV class="w-full md:max-w-[90vw]"&gt;
&lt;DIV class="codeWrapper text-light selection:text-super selection:bg-super/10 my-md relative flex flex-col rounded font-mono text-sm font-normal bg-subtler"&gt;
&lt;DIV class="translate-y-xs -translate-x-xs bottom-xl mb-xl flex h-0 items-start justify-end md:sticky md:top-[100px]"&gt;
&lt;DIV class="overflow-hidden rounded-full border-subtlest ring-subtlest divide-subtlest bg-base"&gt;
&lt;DIV class="border-subtlest ring-subtlest divide-subtlest bg-subtler"&gt;&amp;nbsp;&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;DIV class="-mt-xl"&gt;
&lt;DIV&gt;
&lt;DIV class="text-quiet bg-subtle py-xs px-sm inline-block rounded-br rounded-tl-[3px] font-thin" data-testid="code-language-indicator"&gt;python&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;DIV&gt;&lt;SPAN&gt;&lt;CODE&gt;&lt;SPAN class="token token"&gt;import&lt;/SPAN&gt; mlflow
&lt;SPAN class="token token"&gt;from&lt;/SPAN&gt; pyspark&lt;SPAN class="token token punctuation"&gt;.&lt;/SPAN&gt;sql &lt;SPAN class="token token"&gt;import&lt;/SPAN&gt; functions &lt;SPAN class="token token"&gt;as&lt;/SPAN&gt; F

predict_udf &lt;SPAN class="token token operator"&gt;=&lt;/SPAN&gt; mlflow&lt;SPAN class="token token punctuation"&gt;.&lt;/SPAN&gt;pyfunc&lt;SPAN class="token token punctuation"&gt;.&lt;/SPAN&gt;spark_udf&lt;SPAN class="token token punctuation"&gt;(&lt;/SPAN&gt;spark&lt;SPAN class="token token punctuation"&gt;,&lt;/SPAN&gt; model_uri&lt;SPAN class="token token operator"&gt;=&lt;/SPAN&gt;&lt;SPAN class="token token"&gt;"models:/my_model/Production"&lt;/SPAN&gt;&lt;SPAN class="token token punctuation"&gt;)&lt;/SPAN&gt;
df &lt;SPAN class="token token operator"&gt;=&lt;/SPAN&gt; df&lt;SPAN class="token token punctuation"&gt;.&lt;/SPAN&gt;withColumn&lt;SPAN class="token token punctuation"&gt;(&lt;/SPAN&gt;&lt;SPAN class="token token"&gt;"prediction"&lt;/SPAN&gt;&lt;SPAN class="token token punctuation"&gt;,&lt;/SPAN&gt; predict_udf&lt;SPAN class="token token punctuation"&gt;(&lt;/SPAN&gt;&lt;SPAN class="token token operator"&gt;*&lt;/SPAN&gt;df&lt;SPAN class="token token punctuation"&gt;.&lt;/SPAN&gt;columns&lt;SPAN class="token token punctuation"&gt;)&lt;/SPAN&gt;&lt;SPAN class="token token punctuation"&gt;)&lt;/SPAN&gt;
df&lt;SPAN class="token token punctuation"&gt;.&lt;/SPAN&gt;write&lt;SPAN class="token token punctuation"&gt;.&lt;/SPAN&gt;mode&lt;SPAN class="token token punctuation"&gt;(&lt;/SPAN&gt;&lt;SPAN class="token token"&gt;"overwrite"&lt;/SPAN&gt;&lt;SPAN class="token token punctuation"&gt;)&lt;/SPAN&gt;&lt;SPAN class="token token punctuation"&gt;.&lt;/SPAN&gt;saveAsTable&lt;SPAN class="token token punctuation"&gt;(&lt;/SPAN&gt;&lt;SPAN class="token token"&gt;"predictions_output"&lt;/SPAN&gt;&lt;SPAN class="token token punctuation"&gt;)&lt;/SPAN&gt;&lt;/CODE&gt;&lt;/SPAN&gt;&lt;/DIV&gt;
&lt;DIV&gt;&amp;nbsp;&lt;/DIV&gt;
&lt;DIV&gt;
&lt;P class="my-2 [&amp;amp;+p]:mt-4 [&amp;amp;_strong:has(+br)]:inline-block [&amp;amp;_strong:has(+br)]:pb-2"&gt;This approach allows Spark to distribute inference tasks across multiple executors, avoiding Python’s sequential bottlenecks.&lt;/P&gt;
&lt;H2 class="mb-2 mt-4 font-display font-semimedium text-base first:mt-0"&gt;Parallel Multi-Model Inference&lt;/H2&gt;
&lt;P class="my-2 [&amp;amp;+p]:mt-4 [&amp;amp;_strong:has(+br)]:inline-block [&amp;amp;_strong:has(+br)]:pb-2"&gt;When dealing with multiple models (e.g., per client or product), Databricks supports&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;STRONG&gt;parallel batch inference&lt;/STRONG&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;using the&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;CODE&gt;groupBy.applyInPandas()&lt;/CODE&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;method combined with&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;STRONG&gt;Pandas UDFs&lt;/STRONG&gt;. Each Spark worker can handle inference for a different model, allowing you to:&lt;/P&gt;
&lt;UL class="marker:text-quiet list-disc"&gt;
&lt;LI class="py-0 my-0 prose-p:pt-0 prose-p:mb-2 prose-p:my-0 [&amp;amp;&amp;gt;p]:pt-0 [&amp;amp;&amp;gt;p]:mb-2 [&amp;amp;&amp;gt;p]:my-0"&gt;
&lt;P class="my-2 [&amp;amp;+p]:mt-4 [&amp;amp;_strong:has(+br)]:inline-block [&amp;amp;_strong:has(+br)]:pb-2"&gt;Load each model once per worker process.&lt;/P&gt;
&lt;/LI&gt;
&lt;LI class="py-0 my-0 prose-p:pt-0 prose-p:mb-2 prose-p:my-0 [&amp;amp;&amp;gt;p]:pt-0 [&amp;amp;&amp;gt;p]:mb-2 [&amp;amp;&amp;gt;p]:my-0"&gt;
&lt;P class="my-2 [&amp;amp;+p]:mt-4 [&amp;amp;_strong:has(+br)]:inline-block [&amp;amp;_strong:has(+br)]:pb-2"&gt;Process subsets of data in parallel.​&lt;/P&gt;
&lt;/LI&gt;
&lt;/UL&gt;
&lt;P class="my-2 [&amp;amp;+p]:mt-4 [&amp;amp;_strong:has(+br)]:inline-block [&amp;amp;_strong:has(+br)]:pb-2"&gt;Example pattern:&lt;/P&gt;
&lt;DIV class="w-full md:max-w-[90vw]"&gt;
&lt;DIV class="codeWrapper text-light selection:text-super selection:bg-super/10 my-md relative flex flex-col rounded font-mono text-sm font-normal bg-subtler"&gt;
&lt;DIV class="translate-y-xs -translate-x-xs bottom-xl mb-xl flex h-0 items-start justify-end md:sticky md:top-[100px]"&gt;
&lt;DIV class="overflow-hidden rounded-full border-subtlest ring-subtlest divide-subtlest bg-base"&gt;
&lt;DIV class="border-subtlest ring-subtlest divide-subtlest bg-subtler"&gt;&amp;nbsp;&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;DIV class="-mt-xl"&gt;
&lt;DIV&gt;
&lt;DIV class="text-quiet bg-subtle py-xs px-sm inline-block rounded-br rounded-tl-[3px] font-thin" data-testid="code-language-indicator"&gt;python&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;DIV&gt;&lt;SPAN&gt;&lt;CODE&gt;&lt;SPAN class="token token"&gt;def&lt;/SPAN&gt; &lt;SPAN class="token token"&gt;run_inference&lt;/SPAN&gt;&lt;SPAN class="token token punctuation"&gt;(&lt;/SPAN&gt;pdf&lt;SPAN class="token token punctuation"&gt;)&lt;/SPAN&gt;&lt;SPAN class="token token punctuation"&gt;:&lt;/SPAN&gt;
    model_path &lt;SPAN class="token token operator"&gt;=&lt;/SPAN&gt; pdf&lt;SPAN class="token token punctuation"&gt;[&lt;/SPAN&gt;&lt;SPAN class="token token"&gt;'model_path'&lt;/SPAN&gt;&lt;SPAN class="token token punctuation"&gt;]&lt;/SPAN&gt;&lt;SPAN class="token token punctuation"&gt;.&lt;/SPAN&gt;iloc&lt;SPAN class="token token punctuation"&gt;[&lt;/SPAN&gt;&lt;SPAN class="token token"&gt;0&lt;/SPAN&gt;&lt;SPAN class="token token punctuation"&gt;]&lt;/SPAN&gt;
    model &lt;SPAN class="token token operator"&gt;=&lt;/SPAN&gt; mlflow&lt;SPAN class="token token punctuation"&gt;.&lt;/SPAN&gt;pyfunc&lt;SPAN class="token token punctuation"&gt;.&lt;/SPAN&gt;load_model&lt;SPAN class="token token punctuation"&gt;(&lt;/SPAN&gt;model_path&lt;SPAN class="token token punctuation"&gt;)&lt;/SPAN&gt;
    pdf&lt;SPAN class="token token punctuation"&gt;[&lt;/SPAN&gt;&lt;SPAN class="token token"&gt;'prediction'&lt;/SPAN&gt;&lt;SPAN class="token token punctuation"&gt;]&lt;/SPAN&gt; &lt;SPAN class="token token operator"&gt;=&lt;/SPAN&gt; model&lt;SPAN class="token token punctuation"&gt;.&lt;/SPAN&gt;predict&lt;SPAN class="token token punctuation"&gt;(&lt;/SPAN&gt;pdf&lt;SPAN class="token token punctuation"&gt;[&lt;/SPAN&gt;&lt;SPAN class="token token"&gt;'features'&lt;/SPAN&gt;&lt;SPAN class="token token punctuation"&gt;]&lt;/SPAN&gt;&lt;SPAN class="token token punctuation"&gt;)&lt;/SPAN&gt;
    &lt;SPAN class="token token"&gt;return&lt;/SPAN&gt; pdf

result_df &lt;SPAN class="token token operator"&gt;=&lt;/SPAN&gt; df&lt;SPAN class="token token punctuation"&gt;.&lt;/SPAN&gt;groupBy&lt;SPAN class="token token punctuation"&gt;(&lt;/SPAN&gt;&lt;SPAN class="token token"&gt;"model_id"&lt;/SPAN&gt;&lt;SPAN class="token token punctuation"&gt;)&lt;/SPAN&gt;&lt;SPAN class="token token punctuation"&gt;.&lt;/SPAN&gt;applyInPandas&lt;SPAN class="token token punctuation"&gt;(&lt;/SPAN&gt;run_inference&lt;SPAN class="token token punctuation"&gt;,&lt;/SPAN&gt; schema&lt;SPAN class="token token operator"&gt;=&lt;/SPAN&gt;df&lt;SPAN class="token token punctuation"&gt;.&lt;/SPAN&gt;schema&lt;SPAN class="token token punctuation"&gt;)&lt;/SPAN&gt;
&lt;/CODE&gt;&lt;/SPAN&gt;&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;P class="my-2 [&amp;amp;+p]:mt-4 [&amp;amp;_strong:has(+br)]:inline-block [&amp;amp;_strong:has(+br)]:pb-2"&gt;This design reduces redundant model loading and uses Spark’s distributed compute layer efficiently.&lt;/P&gt;
&lt;H2 class="mb-2 mt-4 font-display font-semimedium text-base first:mt-0"&gt;Mosaic AI &amp;amp; AI Functions&lt;/H2&gt;
&lt;P class="my-2 [&amp;amp;+p]:mt-4 [&amp;amp;_strong:has(+br)]:inline-block [&amp;amp;_strong:has(+br)]:pb-2"&gt;If your inference needs involve standard ML or LLM models, you can simplify further with&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;STRONG&gt;AI Functions&lt;/STRONG&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;or&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;STRONG&gt;Mosaic AI batch inference&lt;/STRONG&gt;. These let you run model inference directly via SQL using functions like&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;CODE&gt;ai_query()&lt;/CODE&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;without manual looping or building pipelines.&lt;/P&gt;
&lt;P class="my-2 [&amp;amp;+p]:mt-4 [&amp;amp;_strong:has(+br)]:inline-block [&amp;amp;_strong:has(+br)]:pb-2"&gt;Example SQL:&lt;/P&gt;
&lt;DIV class="w-full md:max-w-[90vw]"&gt;
&lt;DIV class="codeWrapper text-light selection:text-super selection:bg-super/10 my-md relative flex flex-col rounded font-mono text-sm font-normal bg-subtler"&gt;
&lt;DIV class="translate-y-xs -translate-x-xs bottom-xl mb-xl flex h-0 items-start justify-end md:sticky md:top-[100px]"&gt;
&lt;DIV class="overflow-hidden rounded-full border-subtlest ring-subtlest divide-subtlest bg-base"&gt;
&lt;DIV class="border-subtlest ring-subtlest divide-subtlest bg-subtler"&gt;&amp;nbsp;&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;DIV class="-mt-xl"&gt;
&lt;DIV&gt;
&lt;DIV class="text-quiet bg-subtle py-xs px-sm inline-block rounded-br rounded-tl-[3px] font-thin" data-testid="code-language-indicator"&gt;sql&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;DIV&gt;&lt;SPAN&gt;&lt;CODE&gt;&lt;SPAN class="token token"&gt;SELECT&lt;/SPAN&gt; input_text&lt;SPAN class="token token punctuation"&gt;,&lt;/SPAN&gt; ai_query&lt;SPAN class="token token punctuation"&gt;(&lt;/SPAN&gt;&lt;SPAN class="token token"&gt;'my_registered_model'&lt;/SPAN&gt;&lt;SPAN class="token token punctuation"&gt;,&lt;/SPAN&gt; input_text&lt;SPAN class="token token punctuation"&gt;)&lt;/SPAN&gt; &lt;SPAN class="token token"&gt;AS&lt;/SPAN&gt; prediction
&lt;SPAN class="token token"&gt;FROM&lt;/SPAN&gt; my_input_table
&lt;/CODE&gt;&lt;/SPAN&gt;&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;H2 class="mb-2 mt-4 font-display font-semimedium text-base first:mt-0"&gt;Summary of Best Practices&lt;/H2&gt;
&lt;DIV class="group relative"&gt;
&lt;DIV class="w-full overflow-x-auto md:max-w-[90vw] border-subtlest ring-subtlest divide-subtlest bg-transparent"&gt;
&lt;TABLE class="border-subtler my-[1em] w-full table-auto border-separate border-spacing-0 border-l border-t"&gt;
&lt;THEAD class="bg-subtler"&gt;
&lt;TR&gt;
&lt;TH class="border-subtler p-sm break-normal border-b border-r text-left align-top"&gt;Technique&lt;/TH&gt;
&lt;TH class="border-subtler p-sm break-normal border-b border-r text-left align-top"&gt;Use Case&lt;/TH&gt;
&lt;TH class="border-subtler p-sm break-normal border-b border-r text-left align-top"&gt;Advantage&lt;/TH&gt;
&lt;/TR&gt;
&lt;/THEAD&gt;
&lt;TBODY&gt;
&lt;TR&gt;
&lt;TD class="px-sm border-subtler min-w-[48px] break-normal border-b border-r"&gt;Spark UDFs (&lt;CODE&gt;mlflow.pyfunc.spark_udf&lt;/CODE&gt;)&lt;/TD&gt;
&lt;TD class="px-sm border-subtler min-w-[48px] break-normal border-b border-r"&gt;Standard ML models&lt;/TD&gt;
&lt;TD class="px-sm border-subtler min-w-[48px] break-normal border-b border-r"&gt;Simplifies batch scoring&lt;/TD&gt;
&lt;/TR&gt;
&lt;TR&gt;
&lt;TD class="px-sm border-subtler min-w-[48px] break-normal border-b border-r"&gt;Pandas UDFs with&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;CODE&gt;groupBy.applyInPandas&lt;/CODE&gt;&lt;/TD&gt;
&lt;TD class="px-sm border-subtler min-w-[48px] break-normal border-b border-r"&gt;Many models (per group)&lt;/TD&gt;
&lt;TD class="px-sm border-subtler min-w-[48px] break-normal border-b border-r"&gt;Parallel per-model inference&lt;/TD&gt;
&lt;/TR&gt;
&lt;TR&gt;
&lt;TD class="px-sm border-subtler min-w-[48px] break-normal border-b border-r"&gt;Mosaic AI / AI Functions&lt;/TD&gt;
&lt;TD class="px-sm border-subtler min-w-[48px] break-normal border-b border-r"&gt;LLMs or unified inference&lt;/TD&gt;
&lt;TD class="px-sm border-subtler min-w-[48px] break-normal border-b border-r"&gt;Simplified SQL-based scaling&lt;/TD&gt;
&lt;/TR&gt;
&lt;TR&gt;
&lt;TD class="px-sm border-subtler min-w-[48px] break-normal border-b border-r"&gt;Delta Live Tables (DLT)&lt;/TD&gt;
&lt;TD class="px-sm border-subtler min-w-[48px] break-normal border-b border-r"&gt;Scheduled, repeatable jobs&lt;/TD&gt;
&lt;TD class="px-sm border-subtler min-w-[48px] break-normal border-b border-r"&gt;Automates production batch runs&lt;/TD&gt;
&lt;/TR&gt;
&lt;/TBODY&gt;
&lt;/TABLE&gt;
&lt;/DIV&gt;
&lt;DIV class="bg-base border-subtler shadow-subtle pointer-coarse:opacity-100 right-xs absolute bottom-0 flex rounded-lg border opacity-0 transition-opacity group-hover:opacity-100 [&amp;amp;&amp;gt;*:not(:first-child)]:border-subtle [&amp;amp;&amp;gt;*:not(:first-child)]:border-l"&gt;
&lt;DIV class="flex"&gt;&amp;nbsp;&lt;/DIV&gt;
&lt;DIV class="flex"&gt;&amp;nbsp;&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;P class="my-2 [&amp;amp;+p]:mt-4 [&amp;amp;_strong:has(+br)]:inline-block [&amp;amp;_strong:has(+br)]:pb-2"&gt;In short, replace your Python loops with&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;STRONG&gt;Spark-level Pandas UDFs or Databricks batch inference functions&lt;/STRONG&gt;. This takes advantage of cluster parallelism and avoids sequential execution, allowing all your model inferences to run efficiently in parallel across nodes.&lt;/P&gt;
&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;/DIV&gt;</description>
      <pubDate>Fri, 24 Oct 2025 14:05:42 GMT</pubDate>
      <guid>https://community.databricks.com/t5/machine-learning/how-to-speed-up-inference/m-p/135959#M4362</guid>
      <dc:creator>mark_ott</dc:creator>
      <dc:date>2025-10-24T14:05:42Z</dc:date>
    </item>
    <item>
      <title>Re: how to speed up inference?</title>
      <link>https://community.databricks.com/t5/machine-learning/how-to-speed-up-inference/m-p/135962#M4363</link>
      <description>&lt;P&gt;Hi &lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/133094"&gt;@jeremy98&lt;/a&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;I have not tried this - but could using Python's multiprocessing library to assign the inference for different models to different CPU cores be something you would want to give an attempt?&lt;/P&gt;
&lt;P&gt;Also here's a useful blog -&amp;nbsp;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;A class="c-link" href="https://docs.databricks.com/en/machine-learning/model-serving/serve-multiple-models-to-serving-endpoint.html" target="_blank" rel="noopener noreferrer" data-stringify-link="https://docs.databricks.com/en/machine-learning/model-serving/serve-multiple-models-to-serving-endpoint.html" data-sk="tooltip_parent"&gt;https://docs.databricks.com/en/machine-learning/model-serving/serve-multiple-models-to-serving-endpoint.html&lt;/A&gt;&amp;nbsp;and&amp;nbsp;&lt;A href="https://www.databricks.com/blog/2022/07/20/parallel-ml-how-compass-built-a-framework-for-training-many-machine-learning-models-on-databricks.html" target="_blank"&gt;https://www.databricks.com/blog/2022/07/20/parallel-ml-how-compass-built-a-framework-for-training-many-machine-learning-models-on-databricks.html&lt;/A&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Thanks!&lt;/P&gt;</description>
      <pubDate>Fri, 24 Oct 2025 14:08:55 GMT</pubDate>
      <guid>https://community.databricks.com/t5/machine-learning/how-to-speed-up-inference/m-p/135962#M4363</guid>
      <dc:creator>NandiniN</dc:creator>
      <dc:date>2025-10-24T14:08:55Z</dc:date>
    </item>
  </channel>
</rss>

