<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: DBRX - Serving endpoint failed - update timed out. in Machine Learning</title>
    <link>https://community.databricks.com/t5/machine-learning/dbrx-serving-endpoint-failed-update-timed-out/m-p/71582#M3330</link>
    <description>&lt;P&gt;Thank you for the answer.&lt;/P&gt;&lt;UL&gt;&lt;LI&gt;The model is located in Unity Catalog like so:&lt;BR /&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="quintrix_1-1717485792659.png" style="width: 400px;"&gt;&lt;img src="https://community.databricks.com/t5/image/serverpage/image-id/8018iBDCEB0655C25B48A/image-size/medium/is-moderation-mode/true?v=v2&amp;amp;px=400" role="button" title="quintrix_1-1717485792659.png" alt="quintrix_1-1717485792659.png" /&gt;&lt;/span&gt;&lt;/LI&gt;&lt;LI&gt;The model isn't deployed yet so can't check health metrics.&lt;/LI&gt;&lt;LI&gt;I don't use Azure DevOps&lt;/LI&gt;&lt;LI&gt;I've implemented 5 retries (first run creates the endpoint, next ones try to update it), but all generate the same error. Each time it seems to fail after similar period of time:&lt;BR /&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="quintrix_2-1717496898846.png" style="width: 400px;"&gt;&lt;img src="https://community.databricks.com/t5/image/serverpage/image-id/8020i8F62624EE7A129B5/image-size/medium/is-moderation-mode/true?v=v2&amp;amp;px=400" role="button" title="quintrix_2-1717496898846.png" alt="quintrix_2-1717496898846.png" /&gt;&lt;/span&gt;&lt;/LI&gt;&lt;LI&gt;If I understand correctly, the model serving does not take place on my cluster where I can set environment variables - correct me if I'm wrong please. I can run endpoint creation with cluster off using UI and&amp;nbsp;none of my clusters are running at this time:&lt;BR /&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="quintrix_3-1717498078275.png" style="width: 400px;"&gt;&lt;img src="https://community.databricks.com/t5/image/serverpage/image-id/8021i1DE9AF38A9F8A829/image-size/medium/is-moderation-mode/true?v=v2&amp;amp;px=400" role="button" title="quintrix_3-1717498078275.png" alt="quintrix_3-1717498078275.png" /&gt;&lt;/span&gt;&lt;BR /&gt;So where can I set the variable?&lt;BR /&gt;I've set the variable in a code before executing create_and_wait() method but not sure if it's correct.&lt;/LI&gt;&lt;LI&gt;Any other ideas?&lt;/LI&gt;&lt;LI&gt;What about the conda exceptions during deployement - how could I debug it.&lt;/LI&gt;&lt;/UL&gt;&lt;P&gt;As a test I also served simple linear regression model. The endpoint has been created successfully and works fine.&amp;nbsp;&lt;BR /&gt;&lt;BR /&gt;&lt;/P&gt;</description>
    <pubDate>Tue, 04 Jun 2024 10:59:15 GMT</pubDate>
    <dc:creator>quintrix</dc:creator>
    <dc:date>2024-06-04T10:59:15Z</dc:date>
    <item>
      <title>DBRX - Serving endpoint failed - update timed out.</title>
      <link>https://community.databricks.com/t5/machine-learning/dbrx-serving-endpoint-failed-update-timed-out/m-p/71400#M3323</link>
      <description>&lt;P&gt;Hi,&lt;/P&gt;&lt;P&gt;&lt;A href="https://notebooks.databricks.com/demos/llm-rag-chatbot/index.html" target="_blank" rel="noopener"&gt;https://notebooks.databricks.com/demos/llm-rag-chatbot/index.html&lt;/A&gt;&lt;/P&gt;&lt;P&gt;Following this tutorial&amp;nbsp;I'm trying to serve an endpoint with DBRX model connected to my data in Vector Db.&lt;BR /&gt;Without any problem I can log my model in Databricks with MLFlow and call the model locally form notebooks but when I try to serve the endpoint it still fails after about 35-40 minutes with message:&lt;/P&gt;&lt;P&gt;&lt;SPAN class=""&gt;OperationFailed: &lt;/SPAN&gt;&lt;SPAN&gt;failed to reach NOT_UPDATING, got EndpointStateConfigUpdate.UPDATE_FAILED: current status: EndpointStateConfigUpdate.UPDATE_FAILED&lt;BR /&gt;&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;In the create_and_wait() method I set the timeout parameter for two hours to prevent stopping the method after default 20 minutes like so:&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;&lt;DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;w.serving_endpoints.&lt;/SPAN&gt;&lt;SPAN&gt;create_and_wait&lt;/SPAN&gt;&lt;SPAN&gt;(&lt;/SPAN&gt;&lt;SPAN&gt;name&lt;/SPAN&gt;&lt;SPAN&gt;=&lt;/SPAN&gt;&lt;SPAN&gt;serving_endpoint_name, &lt;/SPAN&gt;&lt;SPAN&gt;config&lt;/SPAN&gt;&lt;SPAN&gt;=&lt;/SPAN&gt;&lt;SPAN&gt;endpoint_config, &lt;/SPAN&gt;&lt;SPAN&gt;timeout&lt;/SPAN&gt;&lt;SPAN&gt;=&lt;/SPAN&gt;&lt;SPAN&gt;timedelta&lt;/SPAN&gt;&lt;SPAN&gt;(&lt;/SPAN&gt;&lt;SPAN&gt;hours&lt;/SPAN&gt;&lt;SPAN&gt;=2&lt;/SPAN&gt;&lt;SPAN&gt;))&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;and the value is working properly but there must be another issue causing timeout error.&lt;BR /&gt;&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;&lt;BR /&gt;Screenshots from Serving tab in Databricks:&lt;BR /&gt;&lt;/SPAN&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="quintrix_0-1717395746345.png" style="width: 400px;"&gt;&lt;img src="https://community.databricks.com/t5/image/serverpage/image-id/8008iDF6387214A1ADB5B/image-size/medium/is-moderation-mode/true?v=v2&amp;amp;px=400" role="button" title="quintrix_0-1717395746345.png" alt="quintrix_0-1717395746345.png" /&gt;&lt;/span&gt;&lt;P&gt;In the service logs I can see also some exceptions rised by conda:&lt;/P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="quintrix_1-1717395859283.png" style="width: 400px;"&gt;&lt;img src="https://community.databricks.com/t5/image/serverpage/image-id/8009i6C262E076CF84BC6/image-size/medium/is-moderation-mode/true?v=v2&amp;amp;px=400" role="button" title="quintrix_1-1717395859283.png" alt="quintrix_1-1717395859283.png" /&gt;&lt;/span&gt;&lt;P&gt;Any idea how to solve the issue?&amp;nbsp;&lt;/P&gt;&lt;/DIV&gt;&lt;/DIV&gt;</description>
      <pubDate>Mon, 03 Jun 2024 06:29:22 GMT</pubDate>
      <guid>https://community.databricks.com/t5/machine-learning/dbrx-serving-endpoint-failed-update-timed-out/m-p/71400#M3323</guid>
      <dc:creator>quintrix</dc:creator>
      <dc:date>2024-06-03T06:29:22Z</dc:date>
    </item>
    <item>
      <title>Re: DBRX - Serving endpoint failed - update timed out.</title>
      <link>https://community.databricks.com/t5/machine-learning/dbrx-serving-endpoint-failed-update-timed-out/m-p/71582#M3330</link>
      <description>&lt;P&gt;Thank you for the answer.&lt;/P&gt;&lt;UL&gt;&lt;LI&gt;The model is located in Unity Catalog like so:&lt;BR /&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="quintrix_1-1717485792659.png" style="width: 400px;"&gt;&lt;img src="https://community.databricks.com/t5/image/serverpage/image-id/8018iBDCEB0655C25B48A/image-size/medium/is-moderation-mode/true?v=v2&amp;amp;px=400" role="button" title="quintrix_1-1717485792659.png" alt="quintrix_1-1717485792659.png" /&gt;&lt;/span&gt;&lt;/LI&gt;&lt;LI&gt;The model isn't deployed yet so can't check health metrics.&lt;/LI&gt;&lt;LI&gt;I don't use Azure DevOps&lt;/LI&gt;&lt;LI&gt;I've implemented 5 retries (first run creates the endpoint, next ones try to update it), but all generate the same error. Each time it seems to fail after similar period of time:&lt;BR /&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="quintrix_2-1717496898846.png" style="width: 400px;"&gt;&lt;img src="https://community.databricks.com/t5/image/serverpage/image-id/8020i8F62624EE7A129B5/image-size/medium/is-moderation-mode/true?v=v2&amp;amp;px=400" role="button" title="quintrix_2-1717496898846.png" alt="quintrix_2-1717496898846.png" /&gt;&lt;/span&gt;&lt;/LI&gt;&lt;LI&gt;If I understand correctly, the model serving does not take place on my cluster where I can set environment variables - correct me if I'm wrong please. I can run endpoint creation with cluster off using UI and&amp;nbsp;none of my clusters are running at this time:&lt;BR /&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="quintrix_3-1717498078275.png" style="width: 400px;"&gt;&lt;img src="https://community.databricks.com/t5/image/serverpage/image-id/8021i1DE9AF38A9F8A829/image-size/medium/is-moderation-mode/true?v=v2&amp;amp;px=400" role="button" title="quintrix_3-1717498078275.png" alt="quintrix_3-1717498078275.png" /&gt;&lt;/span&gt;&lt;BR /&gt;So where can I set the variable?&lt;BR /&gt;I've set the variable in a code before executing create_and_wait() method but not sure if it's correct.&lt;/LI&gt;&lt;LI&gt;Any other ideas?&lt;/LI&gt;&lt;LI&gt;What about the conda exceptions during deployement - how could I debug it.&lt;/LI&gt;&lt;/UL&gt;&lt;P&gt;As a test I also served simple linear regression model. The endpoint has been created successfully and works fine.&amp;nbsp;&lt;BR /&gt;&lt;BR /&gt;&lt;/P&gt;</description>
      <pubDate>Tue, 04 Jun 2024 10:59:15 GMT</pubDate>
      <guid>https://community.databricks.com/t5/machine-learning/dbrx-serving-endpoint-failed-update-timed-out/m-p/71582#M3330</guid>
      <dc:creator>quintrix</dc:creator>
      <dc:date>2024-06-04T10:59:15Z</dc:date>
    </item>
  </channel>
</rss>

