<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: RuntimeError: Expected to mark a variable ready only once error in Machine Learning</title>
    <link>https://community.databricks.com/t5/machine-learning/runtimeerror-expected-to-mark-a-variable-ready-only-once-error/m-p/70820#M3284</link>
    <description>&lt;P&gt;Hello&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/91731"&gt;@saleem_shady&lt;/a&gt;!&lt;/P&gt;
&lt;P&gt;Have you tried including the parameter ddp_find_unused_parameters=False in your TrainingArguments? Here's an example of how to include it:&amp;nbsp;&lt;A href="https://github.com/databricks/databricks-ml-examples/blob/master/llm-models/llamav2/llamav2-7b/06_fine_tune_qlora.py#L209" target="_blank"&gt;https://github.com/databricks/databricks-ml-examples/blob/master/llm-models/llamav2/llamav2-7b/06_fine_tune_qlora.py#L209&lt;/A&gt;&lt;BR /&gt;&lt;BR /&gt;If you have already included this parameter and are still encountering issues, please share the error message you are receiving as a reply in this post. &lt;/P&gt;
&lt;P&gt;Best Regards,&lt;/P&gt;
&lt;P&gt;Jéssica Santos&lt;/P&gt;</description>
    <pubDate>Mon, 27 May 2024 21:52:39 GMT</pubDate>
    <dc:creator>jessysantos</dc:creator>
    <dc:date>2024-05-27T21:52:39Z</dc:date>
    <item>
      <title>RuntimeError: Expected to mark a variable ready only once error</title>
      <link>https://community.databricks.com/t5/machine-learning/runtimeerror-expected-to-mark-a-variable-ready-only-once-error/m-p/49378#M2675</link>
      <description>&lt;P&gt;I'm using a Single Node machine with g5-2x-large to fine tune a LLaMa-2 model. My Come Notebook runs very smoothly on Google Col but when I try to run it on `Databricks`, it throws me the exact error given below:&lt;/P&gt;&lt;P&gt;RuntimeError: Expected to mark a variable ready only once. This error is caused by one of the following reasons: 1) Use of a module parameter outside the `forward` function. Please make sure model parameters are not shared across multiple concurrent forward-backward passes. or try to use _set_static_graph() as a workaround if this module graph does not change during training loop.2) Reused parameters in multiple reentrant backward passes. For example, if you use multiple `checkpoint` functions to wrap the same part of your model, it would result in the same set of parameters been used by different reentrant backward passes multiple times, and hence marking a variable ready multiple times. DDP does not support such use cases in default. You can try to use _set_static_graph() as a workaround if your module graph does not change over iterations.&lt;BR /&gt;Parameter at index 191 has been marked as ready twice. This means that multiple auto-grad engine hooks have fired for this particular parameter during this iteration. You can set the environment variable TORCH_DISTRIBUTED_DEBUG to either INFO or DETAIL to print parameter names for further debugging.&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;Here is my code for Fine Tuning LLaMa v-2 and&amp;nbsp;&amp;nbsp;&lt;A href="https://stackoverflow.com/questions/77309121/runtimeerror-expected-to-mark-a-variable-ready-only-once-error-in-databricks-no" target="_self"&gt;Original Issue&lt;/A&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Tue, 17 Oct 2023 13:03:50 GMT</pubDate>
      <guid>https://community.databricks.com/t5/machine-learning/runtimeerror-expected-to-mark-a-variable-ready-only-once-error/m-p/49378#M2675</guid>
      <dc:creator>saleem_shady</dc:creator>
      <dc:date>2023-10-17T13:03:50Z</dc:date>
    </item>
    <item>
      <title>Re: RuntimeError: Expected to mark a variable ready only once error</title>
      <link>https://community.databricks.com/t5/machine-learning/runtimeerror-expected-to-mark-a-variable-ready-only-once-error/m-p/70820#M3284</link>
      <description>&lt;P&gt;Hello&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/91731"&gt;@saleem_shady&lt;/a&gt;!&lt;/P&gt;
&lt;P&gt;Have you tried including the parameter ddp_find_unused_parameters=False in your TrainingArguments? Here's an example of how to include it:&amp;nbsp;&lt;A href="https://github.com/databricks/databricks-ml-examples/blob/master/llm-models/llamav2/llamav2-7b/06_fine_tune_qlora.py#L209" target="_blank"&gt;https://github.com/databricks/databricks-ml-examples/blob/master/llm-models/llamav2/llamav2-7b/06_fine_tune_qlora.py#L209&lt;/A&gt;&lt;BR /&gt;&lt;BR /&gt;If you have already included this parameter and are still encountering issues, please share the error message you are receiving as a reply in this post. &lt;/P&gt;
&lt;P&gt;Best Regards,&lt;/P&gt;
&lt;P&gt;Jéssica Santos&lt;/P&gt;</description>
      <pubDate>Mon, 27 May 2024 21:52:39 GMT</pubDate>
      <guid>https://community.databricks.com/t5/machine-learning/runtimeerror-expected-to-mark-a-variable-ready-only-once-error/m-p/70820#M3284</guid>
      <dc:creator>jessysantos</dc:creator>
      <dc:date>2024-05-27T21:52:39Z</dc:date>
    </item>
  </channel>
</rss>

