<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic How to fix this runtime error in this Databricks distributed training tutorial workbook in Machine Learning</title>
    <link>https://community.databricks.com/t5/machine-learning/how-to-fix-this-runtime-error-in-this-databricks-distributed/m-p/41057#M2073</link>
    <description>&lt;P&gt;I am following along with this &lt;A href="https://docs.databricks.com/en/_extras/notebooks/source/deep-learning/distributed-fine-tuning-hugging-face.html" target="_blank" rel="noopener"&gt;notebook&lt;/A&gt;&amp;nbsp;found from this &lt;A href="https://docs.databricks.com/en/machine-learning/train-model/distributed-training/spark-pytorch-distributor.html" target="_blank" rel="noopener"&gt;article&lt;/A&gt;. I am attempting to fine tune the model with a single node and multiple GPUs, so I run everything up to the "Run Local Training" section, but from there I skip to "Run distributed training on a single node with multiple GPUs". When I run the that first block though, I get this error:&lt;/P&gt;&lt;P&gt;`RuntimeError: TorchDistributor failed during training. View stdout logs for detailed error message.`&lt;/P&gt;&lt;P&gt;Here is the full output I see from the code block:&lt;BR /&gt;```&lt;BR /&gt;We're using 4 GPUs&lt;BR /&gt;Started local training with 4 processes&lt;BR /&gt;WARNING:__main__:&lt;BR /&gt;*****************************************&lt;BR /&gt;Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed.&lt;BR /&gt;*****************************************&lt;BR /&gt;2023-08-22 19:31:47.794586: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA&lt;BR /&gt;To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.&lt;BR /&gt;2023-08-22 19:31:47.809864: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA&lt;BR /&gt;To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.&lt;BR /&gt;2023-08-22 19:31:47.824423: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA&lt;BR /&gt;To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.&lt;BR /&gt;2023-08-22 19:31:47.828933: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA&lt;BR /&gt;To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.&lt;BR /&gt;/databricks/python/lib/python3.10/site-packages/transformers/optimization.py:407: FutureWarning: This implementation of AdamW is deprecated and will be removed in a future version. Use the PyTorch implementation torch.optim.AdamW instead, or set `no_deprecation_warning=True` to disable this warning&lt;BR /&gt;warnings.warn(&lt;BR /&gt;/databricks/python/lib/python3.10/site-packages/transformers/optimization.py:407: FutureWarning: This implementation of AdamW is deprecated and will be removed in a future version. Use the PyTorch implementation torch.optim.AdamW instead, or set `no_deprecation_warning=True` to disable this warning&lt;BR /&gt;warnings.warn(&lt;BR /&gt;/databricks/python/lib/python3.10/site-packages/transformers/optimization.py:407: FutureWarning: This implementation of AdamW is deprecated and will be removed in a future version. Use the PyTorch implementation torch.optim.AdamW instead, or set `no_deprecation_warning=True` to disable this warning&lt;BR /&gt;warnings.warn(&lt;BR /&gt;/databricks/python/lib/python3.10/site-packages/transformers/optimization.py:407: FutureWarning: This implementation of AdamW is deprecated and will be removed in a future version. Use the PyTorch implementation torch.optim.AdamW instead, or set `no_deprecation_warning=True` to disable this warning&lt;BR /&gt;warnings.warn(&lt;BR /&gt;You're using a DistilBertTokenizerFast tokenizer. Please note that with a fast tokenizer, using the `__call__` method is faster than using a method to encode the text followed by a call to the `pad` method to get a padded encoding.&lt;BR /&gt;You're using a DistilBertTokenizerFast tokenizer. Please note that with a fast tokenizer, using the `__call__` method is faster than using a method to encode the text followed by a call to the `pad` method to get a padded encoding.&lt;BR /&gt;You're using a DistilBertTokenizerFast tokenizer. Please note that with a fast tokenizer, using the `__call__` method is faster than using a method to encode the text followed by a call to the `pad` method to get a padded encoding.&lt;BR /&gt;Traceback (most recent call last):&lt;BR /&gt;File "/tmp/tmpz1ss252g/train.py", line 8, in &amp;lt;module&amp;gt;&lt;BR /&gt;output = train_fn(*args)&lt;BR /&gt;File "&amp;lt;command-2821949673242075&amp;gt;", line 46, in train_model&lt;BR /&gt;File "/databricks/python/lib/python3.10/site-packages/transformers/trainer.py", line 1664, in train&lt;BR /&gt;return inner_training_loop(&lt;BR /&gt;File "/databricks/python/lib/python3.10/site-packages/transformers/trainer.py", line 1855, in _inner_training_loop&lt;BR /&gt;self.control = self.callback_handler.on_train_begin(args, self.state, self.control)&lt;BR /&gt;File "/databricks/python/lib/python3.10/site-packages/transformers/trainer_callback.py", line 353, in on_train_begin&lt;BR /&gt;return self.call_event("on_train_begin", args, state, control)&lt;BR /&gt;File "/databricks/python/lib/python3.10/site-packages/transformers/trainer_callback.py", line 397, in call_event&lt;BR /&gt;result = getattr(callback, event)(&lt;BR /&gt;File "/databricks/python/lib/python3.10/site-packages/transformers/integrations.py", line 1021, in on_train_begin&lt;BR /&gt;self.setup(args, state, model)&lt;BR /&gt;File "/databricks/python/lib/python3.10/site-packages/transformers/integrations.py", line 990, in setup&lt;BR /&gt;self._ml_flow.start_run(run_name=args.run_name, nested=self._nested_run)&lt;BR /&gt;File "/databricks/python/lib/python3.10/site-packages/mlflow/tracking/fluent.py", line 363, in start_run&lt;BR /&gt;active_run_obj = client.create_run(&lt;BR /&gt;File "/databricks/python/lib/python3.10/site-packages/mlflow/tracking/client.py", line 326, in create_run&lt;BR /&gt;return self._tracking_client.create_run(experiment_id, start_time, tags, run_name)&lt;BR /&gt;File "/databricks/python/lib/python3.10/site-packages/mlflow/tracking/_tracking_service/client.py", line 133, in create_run&lt;BR /&gt;return self.store.create_run(&lt;BR /&gt;File "/databricks/python/lib/python3.10/site-packages/mlflow/store/tracking/rest_store.py", line 178, in create_run&lt;BR /&gt;response_proto = self._call_endpoint(CreateRun, req_body)&lt;BR /&gt;File "/databricks/python/lib/python3.10/site-packages/mlflow/store/tracking/rest_store.py", line 59, in _call_endpoint&lt;BR /&gt;return call_endpoint(self.get_host_creds(), endpoint, method, json_body, response_proto)&lt;BR /&gt;File "/databricks/python/lib/python3.10/site-packages/mlflow/utils/databricks_utils.py", line 422, in get_databricks_host_creds&lt;BR /&gt;config = provider.get_config()&lt;BR /&gt;File "/databricks/python/lib/python3.10/site-packages/databricks_cli/configure/provider.py", line 134, in get_config&lt;BR /&gt;raise InvalidConfigurationError.for_profile(None)&lt;BR /&gt;databricks_cli.utils.InvalidConfigurationError: You haven't configured the CLI yet! Please configure by entering `/tmp/tmpz1ss252g/train.py configure`&lt;BR /&gt;WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 2572 closing signal SIGTERM&lt;BR /&gt;WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 2573 closing signal SIGTERM&lt;BR /&gt;WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 2574 closing signal SIGTERM&lt;BR /&gt;ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 2571) of binary: /local_disk0/.ephemeral_nfs/envs/pythonEnv-3b3dff80-496a-4c7d-9684-b04a17a299d3/bin/python&lt;BR /&gt;Traceback (most recent call last):&lt;BR /&gt;File "/usr/lib/python3.10/runpy.py", line 196, in _run_module_as_main&lt;BR /&gt;return _run_code(code, main_globals, None,&lt;BR /&gt;File "/usr/lib/python3.10/runpy.py", line 86, in _run_code&lt;BR /&gt;exec(code, run_globals)&lt;BR /&gt;File "/databricks/python/lib/python3.10/site-packages/torch/distributed/run.py", line 766, in &amp;lt;module&amp;gt;&lt;BR /&gt;main()&lt;BR /&gt;File "/databricks/python/lib/python3.10/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 346, in wrapper&lt;BR /&gt;return f(*args, **kwargs)&lt;BR /&gt;File "/databricks/python/lib/python3.10/site-packages/torch/distributed/run.py", line 762, in main&lt;BR /&gt;run(args)&lt;BR /&gt;File "/databricks/python/lib/python3.10/site-packages/torch/distributed/run.py", line 753, in run&lt;BR /&gt;elastic_launch(&lt;BR /&gt;File "/databricks/python/lib/python3.10/site-packages/torch/distributed/launcher/api.py", line 132, in __call__&lt;BR /&gt;return launch_agent(self._config, self._entrypoint, list(args))&lt;BR /&gt;File "/databricks/python/lib/python3.10/site-packages/torch/distributed/launcher/api.py", line 246, in launch_agent&lt;BR /&gt;raise ChildFailedError(&lt;BR /&gt;torch.distributed.elastic.multiprocessing.errors.ChildFailedError:&lt;BR /&gt;============================================================&lt;BR /&gt;/tmp/tmpz1ss252g/train.py FAILED&lt;BR /&gt;------------------------------------------------------------&lt;BR /&gt;Failures:&lt;BR /&gt;&amp;lt;NO_OTHER_FAILURES&amp;gt;&lt;BR /&gt;------------------------------------------------------------&lt;BR /&gt;Root Cause (first observed failure):&lt;BR /&gt;[0]:&lt;BR /&gt;time : 2023-08-22_19:31:58&lt;BR /&gt;host : 0821-144503-em46c4jc-10-52-237-200&lt;BR /&gt;rank : 0 (local_rank: 0)&lt;BR /&gt;exitcode : 1 (pid: 2571)&lt;BR /&gt;error_file: &amp;lt;N/A&amp;gt;&lt;BR /&gt;traceback : To enable traceback see: &lt;A href="https://pytorch.org/docs/stable/elastic/errors.html" target="_blank"&gt;https://pytorch.org/docs/stable/elastic/errors.html&lt;/A&gt;&lt;BR /&gt;============================================================&lt;BR /&gt;```&lt;/P&gt;&lt;P&gt;Do I need to enable more traceback to see more of the error? Do I need to 'configure the CLI', whatever that means? Is there something extremely obvious I'm just missing?&lt;/P&gt;&lt;P&gt;I am using a g5.12xlarge with 4 GPUs, and my DataBricks runtime version is '13.2 ML (includes Apache Spark 3.4.0, GPU, Scala 2.12)'. I'm running this from within a DataBricks notebook.&lt;/P&gt;</description>
    <pubDate>Tue, 22 Aug 2023 20:38:44 GMT</pubDate>
    <dc:creator>AChang</dc:creator>
    <dc:date>2023-08-22T20:38:44Z</dc:date>
    <item>
      <title>How to fix this runtime error in this Databricks distributed training tutorial workbook</title>
      <link>https://community.databricks.com/t5/machine-learning/how-to-fix-this-runtime-error-in-this-databricks-distributed/m-p/41057#M2073</link>
      <description>&lt;P&gt;I am following along with this &lt;A href="https://docs.databricks.com/en/_extras/notebooks/source/deep-learning/distributed-fine-tuning-hugging-face.html" target="_blank" rel="noopener"&gt;notebook&lt;/A&gt;&amp;nbsp;found from this &lt;A href="https://docs.databricks.com/en/machine-learning/train-model/distributed-training/spark-pytorch-distributor.html" target="_blank" rel="noopener"&gt;article&lt;/A&gt;. I am attempting to fine tune the model with a single node and multiple GPUs, so I run everything up to the "Run Local Training" section, but from there I skip to "Run distributed training on a single node with multiple GPUs". When I run the that first block though, I get this error:&lt;/P&gt;&lt;P&gt;`RuntimeError: TorchDistributor failed during training. View stdout logs for detailed error message.`&lt;/P&gt;&lt;P&gt;Here is the full output I see from the code block:&lt;BR /&gt;```&lt;BR /&gt;We're using 4 GPUs&lt;BR /&gt;Started local training with 4 processes&lt;BR /&gt;WARNING:__main__:&lt;BR /&gt;*****************************************&lt;BR /&gt;Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed.&lt;BR /&gt;*****************************************&lt;BR /&gt;2023-08-22 19:31:47.794586: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA&lt;BR /&gt;To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.&lt;BR /&gt;2023-08-22 19:31:47.809864: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA&lt;BR /&gt;To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.&lt;BR /&gt;2023-08-22 19:31:47.824423: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA&lt;BR /&gt;To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.&lt;BR /&gt;2023-08-22 19:31:47.828933: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA&lt;BR /&gt;To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.&lt;BR /&gt;/databricks/python/lib/python3.10/site-packages/transformers/optimization.py:407: FutureWarning: This implementation of AdamW is deprecated and will be removed in a future version. Use the PyTorch implementation torch.optim.AdamW instead, or set `no_deprecation_warning=True` to disable this warning&lt;BR /&gt;warnings.warn(&lt;BR /&gt;/databricks/python/lib/python3.10/site-packages/transformers/optimization.py:407: FutureWarning: This implementation of AdamW is deprecated and will be removed in a future version. Use the PyTorch implementation torch.optim.AdamW instead, or set `no_deprecation_warning=True` to disable this warning&lt;BR /&gt;warnings.warn(&lt;BR /&gt;/databricks/python/lib/python3.10/site-packages/transformers/optimization.py:407: FutureWarning: This implementation of AdamW is deprecated and will be removed in a future version. Use the PyTorch implementation torch.optim.AdamW instead, or set `no_deprecation_warning=True` to disable this warning&lt;BR /&gt;warnings.warn(&lt;BR /&gt;/databricks/python/lib/python3.10/site-packages/transformers/optimization.py:407: FutureWarning: This implementation of AdamW is deprecated and will be removed in a future version. Use the PyTorch implementation torch.optim.AdamW instead, or set `no_deprecation_warning=True` to disable this warning&lt;BR /&gt;warnings.warn(&lt;BR /&gt;You're using a DistilBertTokenizerFast tokenizer. Please note that with a fast tokenizer, using the `__call__` method is faster than using a method to encode the text followed by a call to the `pad` method to get a padded encoding.&lt;BR /&gt;You're using a DistilBertTokenizerFast tokenizer. Please note that with a fast tokenizer, using the `__call__` method is faster than using a method to encode the text followed by a call to the `pad` method to get a padded encoding.&lt;BR /&gt;You're using a DistilBertTokenizerFast tokenizer. Please note that with a fast tokenizer, using the `__call__` method is faster than using a method to encode the text followed by a call to the `pad` method to get a padded encoding.&lt;BR /&gt;Traceback (most recent call last):&lt;BR /&gt;File "/tmp/tmpz1ss252g/train.py", line 8, in &amp;lt;module&amp;gt;&lt;BR /&gt;output = train_fn(*args)&lt;BR /&gt;File "&amp;lt;command-2821949673242075&amp;gt;", line 46, in train_model&lt;BR /&gt;File "/databricks/python/lib/python3.10/site-packages/transformers/trainer.py", line 1664, in train&lt;BR /&gt;return inner_training_loop(&lt;BR /&gt;File "/databricks/python/lib/python3.10/site-packages/transformers/trainer.py", line 1855, in _inner_training_loop&lt;BR /&gt;self.control = self.callback_handler.on_train_begin(args, self.state, self.control)&lt;BR /&gt;File "/databricks/python/lib/python3.10/site-packages/transformers/trainer_callback.py", line 353, in on_train_begin&lt;BR /&gt;return self.call_event("on_train_begin", args, state, control)&lt;BR /&gt;File "/databricks/python/lib/python3.10/site-packages/transformers/trainer_callback.py", line 397, in call_event&lt;BR /&gt;result = getattr(callback, event)(&lt;BR /&gt;File "/databricks/python/lib/python3.10/site-packages/transformers/integrations.py", line 1021, in on_train_begin&lt;BR /&gt;self.setup(args, state, model)&lt;BR /&gt;File "/databricks/python/lib/python3.10/site-packages/transformers/integrations.py", line 990, in setup&lt;BR /&gt;self._ml_flow.start_run(run_name=args.run_name, nested=self._nested_run)&lt;BR /&gt;File "/databricks/python/lib/python3.10/site-packages/mlflow/tracking/fluent.py", line 363, in start_run&lt;BR /&gt;active_run_obj = client.create_run(&lt;BR /&gt;File "/databricks/python/lib/python3.10/site-packages/mlflow/tracking/client.py", line 326, in create_run&lt;BR /&gt;return self._tracking_client.create_run(experiment_id, start_time, tags, run_name)&lt;BR /&gt;File "/databricks/python/lib/python3.10/site-packages/mlflow/tracking/_tracking_service/client.py", line 133, in create_run&lt;BR /&gt;return self.store.create_run(&lt;BR /&gt;File "/databricks/python/lib/python3.10/site-packages/mlflow/store/tracking/rest_store.py", line 178, in create_run&lt;BR /&gt;response_proto = self._call_endpoint(CreateRun, req_body)&lt;BR /&gt;File "/databricks/python/lib/python3.10/site-packages/mlflow/store/tracking/rest_store.py", line 59, in _call_endpoint&lt;BR /&gt;return call_endpoint(self.get_host_creds(), endpoint, method, json_body, response_proto)&lt;BR /&gt;File "/databricks/python/lib/python3.10/site-packages/mlflow/utils/databricks_utils.py", line 422, in get_databricks_host_creds&lt;BR /&gt;config = provider.get_config()&lt;BR /&gt;File "/databricks/python/lib/python3.10/site-packages/databricks_cli/configure/provider.py", line 134, in get_config&lt;BR /&gt;raise InvalidConfigurationError.for_profile(None)&lt;BR /&gt;databricks_cli.utils.InvalidConfigurationError: You haven't configured the CLI yet! Please configure by entering `/tmp/tmpz1ss252g/train.py configure`&lt;BR /&gt;WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 2572 closing signal SIGTERM&lt;BR /&gt;WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 2573 closing signal SIGTERM&lt;BR /&gt;WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 2574 closing signal SIGTERM&lt;BR /&gt;ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 2571) of binary: /local_disk0/.ephemeral_nfs/envs/pythonEnv-3b3dff80-496a-4c7d-9684-b04a17a299d3/bin/python&lt;BR /&gt;Traceback (most recent call last):&lt;BR /&gt;File "/usr/lib/python3.10/runpy.py", line 196, in _run_module_as_main&lt;BR /&gt;return _run_code(code, main_globals, None,&lt;BR /&gt;File "/usr/lib/python3.10/runpy.py", line 86, in _run_code&lt;BR /&gt;exec(code, run_globals)&lt;BR /&gt;File "/databricks/python/lib/python3.10/site-packages/torch/distributed/run.py", line 766, in &amp;lt;module&amp;gt;&lt;BR /&gt;main()&lt;BR /&gt;File "/databricks/python/lib/python3.10/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 346, in wrapper&lt;BR /&gt;return f(*args, **kwargs)&lt;BR /&gt;File "/databricks/python/lib/python3.10/site-packages/torch/distributed/run.py", line 762, in main&lt;BR /&gt;run(args)&lt;BR /&gt;File "/databricks/python/lib/python3.10/site-packages/torch/distributed/run.py", line 753, in run&lt;BR /&gt;elastic_launch(&lt;BR /&gt;File "/databricks/python/lib/python3.10/site-packages/torch/distributed/launcher/api.py", line 132, in __call__&lt;BR /&gt;return launch_agent(self._config, self._entrypoint, list(args))&lt;BR /&gt;File "/databricks/python/lib/python3.10/site-packages/torch/distributed/launcher/api.py", line 246, in launch_agent&lt;BR /&gt;raise ChildFailedError(&lt;BR /&gt;torch.distributed.elastic.multiprocessing.errors.ChildFailedError:&lt;BR /&gt;============================================================&lt;BR /&gt;/tmp/tmpz1ss252g/train.py FAILED&lt;BR /&gt;------------------------------------------------------------&lt;BR /&gt;Failures:&lt;BR /&gt;&amp;lt;NO_OTHER_FAILURES&amp;gt;&lt;BR /&gt;------------------------------------------------------------&lt;BR /&gt;Root Cause (first observed failure):&lt;BR /&gt;[0]:&lt;BR /&gt;time : 2023-08-22_19:31:58&lt;BR /&gt;host : 0821-144503-em46c4jc-10-52-237-200&lt;BR /&gt;rank : 0 (local_rank: 0)&lt;BR /&gt;exitcode : 1 (pid: 2571)&lt;BR /&gt;error_file: &amp;lt;N/A&amp;gt;&lt;BR /&gt;traceback : To enable traceback see: &lt;A href="https://pytorch.org/docs/stable/elastic/errors.html" target="_blank"&gt;https://pytorch.org/docs/stable/elastic/errors.html&lt;/A&gt;&lt;BR /&gt;============================================================&lt;BR /&gt;```&lt;/P&gt;&lt;P&gt;Do I need to enable more traceback to see more of the error? Do I need to 'configure the CLI', whatever that means? Is there something extremely obvious I'm just missing?&lt;/P&gt;&lt;P&gt;I am using a g5.12xlarge with 4 GPUs, and my DataBricks runtime version is '13.2 ML (includes Apache Spark 3.4.0, GPU, Scala 2.12)'. I'm running this from within a DataBricks notebook.&lt;/P&gt;</description>
      <pubDate>Tue, 22 Aug 2023 20:38:44 GMT</pubDate>
      <guid>https://community.databricks.com/t5/machine-learning/how-to-fix-this-runtime-error-in-this-databricks-distributed/m-p/41057#M2073</guid>
      <dc:creator>AChang</dc:creator>
      <dc:date>2023-08-22T20:38:44Z</dc:date>
    </item>
    <item>
      <title>Re: How to fix this runtime error in this Databricks distributed training tutorial workbook</title>
      <link>https://community.databricks.com/t5/machine-learning/how-to-fix-this-runtime-error-in-this-databricks-distributed/m-p/66283#M3195</link>
      <description>&lt;P&gt;Hi AChang, have you eventually resolved the error? I've also having the same error.&lt;/P&gt;</description>
      <pubDate>Mon, 15 Apr 2024 14:45:54 GMT</pubDate>
      <guid>https://community.databricks.com/t5/machine-learning/how-to-fix-this-runtime-error-in-this-databricks-distributed/m-p/66283#M3195</guid>
      <dc:creator>KYX</dc:creator>
      <dc:date>2024-04-15T14:45:54Z</dc:date>
    </item>
    <item>
      <title>Re: How to fix this runtime error in this Databricks distributed training tutorial workbook</title>
      <link>https://community.databricks.com/t5/machine-learning/how-to-fix-this-runtime-error-in-this-databricks-distributed/m-p/66285#M3196</link>
      <description>&lt;P&gt;Hey&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/103597"&gt;@KYX&lt;/a&gt;&amp;nbsp;, I don't believe I ever did. You can try to configure the CLI in the ephemeral terminal in the notebook, but it really shouldn't be necessary to do so, so I think something else has to be up.&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Mon, 15 Apr 2024 14:50:35 GMT</pubDate>
      <guid>https://community.databricks.com/t5/machine-learning/how-to-fix-this-runtime-error-in-this-databricks-distributed/m-p/66285#M3196</guid>
      <dc:creator>AChang</dc:creator>
      <dc:date>2024-04-15T14:50:35Z</dc:date>
    </item>
  </channel>
</rss>

