<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Model serving endpoint creation failed in Machine Learning</title>
    <link>https://community.databricks.com/t5/machine-learning/model-serving-endpoint-creation-failed/m-p/107749#M3933</link>
    <description>&lt;P&gt;I have a logged pyfunc mlflow model that runs without issues in a databricks notebook using&amp;nbsp;&lt;/P&gt;&lt;DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;"mlflow.pyfunc.&lt;/SPAN&gt;&lt;SPAN&gt;load_model&lt;/SPAN&gt;&lt;SPAN&gt;()". I can deploy it without issues as a model serving endpoint with "workload_type" set to GPU, but when i try to update the endpoint to CPU it fails with this repeating error:&amp;nbsp;&lt;BR /&gt;"[pb897] [2025-01-29 14:50:40 +0000] [4014] [INFO] Booting worker with pid: 4014 [pb897] [2025-01-29 14:50:42 +0000] [9] [ERROR] Worker (pid:3932) was sent code 132!"&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&amp;nbsp;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;Why can the exact same configuration run on an environment with GPU but not on a CPU only environment?&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&amp;nbsp;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;I have also tried deleting the endpoint and try re-create it with the CPU config.&lt;/SPAN&gt;&lt;/DIV&gt;&lt;/DIV&gt;</description>
    <pubDate>Thu, 30 Jan 2025 09:11:45 GMT</pubDate>
    <dc:creator>sodrberg</dc:creator>
    <dc:date>2025-01-30T09:11:45Z</dc:date>
    <item>
      <title>Model serving endpoint creation failed</title>
      <link>https://community.databricks.com/t5/machine-learning/model-serving-endpoint-creation-failed/m-p/107749#M3933</link>
      <description>&lt;P&gt;I have a logged pyfunc mlflow model that runs without issues in a databricks notebook using&amp;nbsp;&lt;/P&gt;&lt;DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;"mlflow.pyfunc.&lt;/SPAN&gt;&lt;SPAN&gt;load_model&lt;/SPAN&gt;&lt;SPAN&gt;()". I can deploy it without issues as a model serving endpoint with "workload_type" set to GPU, but when i try to update the endpoint to CPU it fails with this repeating error:&amp;nbsp;&lt;BR /&gt;"[pb897] [2025-01-29 14:50:40 +0000] [4014] [INFO] Booting worker with pid: 4014 [pb897] [2025-01-29 14:50:42 +0000] [9] [ERROR] Worker (pid:3932) was sent code 132!"&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&amp;nbsp;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;Why can the exact same configuration run on an environment with GPU but not on a CPU only environment?&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&amp;nbsp;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;I have also tried deleting the endpoint and try re-create it with the CPU config.&lt;/SPAN&gt;&lt;/DIV&gt;&lt;/DIV&gt;</description>
      <pubDate>Thu, 30 Jan 2025 09:11:45 GMT</pubDate>
      <guid>https://community.databricks.com/t5/machine-learning/model-serving-endpoint-creation-failed/m-p/107749#M3933</guid>
      <dc:creator>sodrberg</dc:creator>
      <dc:date>2025-01-30T09:11:45Z</dc:date>
    </item>
    <item>
      <title>Re: Model serving endpoint creation failed</title>
      <link>https://community.databricks.com/t5/machine-learning/model-serving-endpoint-creation-failed/m-p/108700#M3943</link>
      <description>&lt;DIV class="p-field_section p-field_section--stacked"&gt;
&lt;DIV class="p-mrkdwn_element"&gt;&lt;SPAN data-qa="bk_markdown_element"&gt;The error encountered when updating the endpoint to a CPU-only configuration could be due to several reasons related to dependency and environment configuration mismatches:&lt;/SPAN&gt;&lt;/DIV&gt;
&lt;DIV class="p-mrkdwn_element"&gt;&amp;nbsp;&lt;/DIV&gt;
&lt;DIV class="p-mrkdwn_element"&gt;&lt;SPAN data-qa="bk_markdown_element"&gt;•&amp;nbsp;&lt;STRONG data-stringify-type="bold"&gt;Dependency Mismatch&lt;/STRONG&gt;: The error may be related to mismatched dependencies or environment configurations between the GPU and CPU environments. When the model was initially deployed with GPU support, it might have utilized dependencies specific to the GPU environment that are not compatible or missing in the CPU-only environment. This is often the case when certain libraries or dependencies are optimized for GPU usage and are not available or configured differently for CPU usage.&lt;/SPAN&gt;&lt;/DIV&gt;
&lt;DIV class="p-mrkdwn_element"&gt;&amp;nbsp;&lt;/DIV&gt;
&lt;DIV class="p-mrkdwn_element"&gt;&lt;SPAN data-qa="bk_markdown_element"&gt;•&amp;nbsp;&lt;STRONG data-stringify-type="bold"&gt;Incompatible Python or Package Versions&lt;/STRONG&gt;: The error could also stem from differences in Python versions or package versions (such as&amp;nbsp;&lt;CODE class="c-mrkdwn__code" data-stringify-type="code"&gt;cloudpickle&lt;/CODE&gt;&amp;nbsp;or&amp;nbsp;&lt;CODE class="c-mrkdwn__code" data-stringify-type="code"&gt;pandas&lt;/CODE&gt;) between the environments used for logging, deploying, and serving the model. Ensuring that the Python and package versions are consistent across all environments is critical, as discrepancies can lead to runtime errors.&lt;/SPAN&gt;&lt;/DIV&gt;
&lt;DIV class="p-mrkdwn_element"&gt;&amp;nbsp;&lt;/DIV&gt;
&lt;DIV class="p-mrkdwn_element"&gt;&lt;SPAN data-qa="bk_markdown_element"&gt;•&amp;nbsp;&lt;STRONG data-stringify-type="bold"&gt;Model Dependency Configuration&lt;/STRONG&gt;: If the model's dependencies were not explicitly specified or captured correctly when logged, the serving environment might not have all the necessary packages. It's important to ensure that all required dependencies are included in the&amp;nbsp;&lt;CODE class="c-mrkdwn__code" data-stringify-type="code"&gt;requirements.txt&lt;/CODE&gt;&amp;nbsp;or&amp;nbsp;&lt;CODE class="c-mrkdwn__code" data-stringify-type="code"&gt;conda.yaml&lt;/CODE&gt;&amp;nbsp;files when logging the model.&lt;/SPAN&gt;&lt;/DIV&gt;
&lt;DIV class="p-mrkdwn_element"&gt;&amp;nbsp;&lt;/DIV&gt;
&lt;DIV class="p-mrkdwn_element"&gt;&lt;SPAN data-qa="bk_markdown_element"&gt;•&amp;nbsp;&lt;STRONG data-stringify-type="bold"&gt;Recreating the Endpoint&lt;/STRONG&gt;: Deleting and recreating the endpoint with a CPU configuration might not resolve the issue if the underlying problem with dependency or environment configuration persists. It is essential to validate and ensure the compatibility of the model and its dependencies with the CPU environment before redeploying.To address these issues:&lt;/SPAN&gt;&lt;/DIV&gt;
&lt;DIV class="p-mrkdwn_element"&gt;&lt;SPAN data-qa="bk_markdown_element"&gt;&lt;BR /&gt;1.&amp;nbsp;&lt;STRONG data-stringify-type="bold"&gt;Validate Dependencies&lt;/STRONG&gt;: Ensure that all required dependencies are explicitly specified and compatible with the CPU environment.&lt;BR /&gt;2.&amp;nbsp;&lt;STRONG data-stringify-type="bold"&gt;Environment Consistency&lt;/STRONG&gt;: Verify that the Python and package versions match those used during the model logging and...&lt;/SPAN&gt;&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;DIV class="p-field_section p-field_section--stacked"&gt;
&lt;DIV class="p-mrkdwn_element"&gt;&lt;SPAN data-qa="bk_markdown_element"&gt;...registration.&lt;BR /&gt;3.&amp;nbsp;&lt;STRONG data-stringify-type="bold"&gt;Test Locally&lt;/STRONG&gt;: Test the model in a local CPU environment to identify any dependency issues before deploying.&lt;/SPAN&gt;&lt;/DIV&gt;
&lt;/DIV&gt;</description>
      <pubDate>Tue, 04 Feb 2025 05:34:12 GMT</pubDate>
      <guid>https://community.databricks.com/t5/machine-learning/model-serving-endpoint-creation-failed/m-p/108700#M3943</guid>
      <dc:creator>kamal_ch</dc:creator>
      <dc:date>2025-02-04T05:34:12Z</dc:date>
    </item>
  </channel>
</rss>

