<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Error &amp;quot;Distributed package doesn't have nccl built in&amp;quot; with Transformers Library. in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/error-quot-distributed-package-doesn-t-have-nccl-built-in-quot/m-p/2843#M115</link>
    <description>&lt;P&gt;I am trying to run a simple training script using HF's transformers library and am running into the error `Distributed package doesn't have nccl built in` error.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Runtime: DBR 13.0 ML - SPark 3.4.0 - Scala 2.12&lt;/P&gt;&lt;P&gt;Driver: i3.xlarge - 4 cores&lt;/P&gt;&lt;P&gt;Note: This is a CPU instance&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;I am trying to fine-tune a transformers model for Sequence Classification - essentially following this tutorial: &lt;A href="https://huggingface.co/docs/transformers/training" alt="https://huggingface.co/docs/transformers/training" target="_blank"&gt;https://huggingface.co/docs/transformers/training&lt;/A&gt;.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;When I try to initialize TrainingArguments (TrainingArguments(output_dir="test_trainer")), I get the error trace&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;PRE&gt;&lt;CODE&gt;---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
File &amp;lt;command-1074749622305054&amp;gt;:3
      1 from transformers import  TrainingArguments
----&amp;gt; 3 TrainingArguments(output_dir="test_trainer")
&amp;nbsp;
File &amp;lt;string&amp;gt;:108, in __init__(self, output_dir, overwrite_output_dir, do_train, do_eval, do_predict, evaluation_strategy, prediction_loss_only, per_device_train_batch_size, per_device_eval_batch_size, per_gpu_train_batch_size, per_gpu_eval_batch_size, gradient_accumulation_steps, eval_accumulation_steps, eval_delay, learning_rate, adam_beta1, adam_beta2, adam_epsilon, max_grad_norm, num_train_epochs, max_steps, lr_scheduler_type, warmup_ratio, warmup_steps, log_level, log_level_replica, log_on_each_node, logging_dir, logging_strategy, logging_first_step, logging_steps, logging_nan_inf_filter, save_strategy, save_steps, save_total_limit, save_on_each_node, no_cuda, use_mps_device, seed, data_seed, jit_mode_eval, use_ipex, bf16, fp16, fp16_opt_level, half_precision_backend, bf16_full_eval, fp16_full_eval, tf32, local_rank, xpu_backend, tpu_num_cores, tpu_metrics_debug, debug, dataloader_drop_last, eval_steps, dataloader_num_workers, past_index, run_name, disable_tqdm, remove_unused_columns, label_names, load_best_model_at_end, metric_for_best_model, greater_is_better, ignore_data_skip, sharded_ddp, fsdp, fsdp_min_num_params, fsdp_transformer_layer_cls_to_wrap, deepspeed, label_smoothing_factor, optim, optim_args, adafactor, group_by_length, length_column_name, report_to, ddp_find_unused_parameters, ddp_bucket_cap_mb, dataloader_pin_memory, skip_memory_metrics, use_legacy_prediction_loop, push_to_hub, resume_from_checkpoint, hub_model_id, hub_strategy, hub_token, hub_private_repo, gradient_checkpointing, include_inputs_for_metrics, fp16_backend, push_to_hub_model_id, push_to_hub_organization, push_to_hub_token, mp_parameters, auto_find_batch_size, full_determinism, torchdynamo, ray_scope, ddp_timeout, torch_compile, torch_compile_backend, torch_compile_mode)
&amp;nbsp;
File /databricks/python/lib/python3.10/site-packages/transformers/training_args.py:1172, in TrainingArguments.__post_init__(self)
   1162     warnings.warn(
   1163         "`--adafactor` is deprecated and will be removed in version 5 of 🤗 Transformers. Use `--optim"
   1164         " adafactor` instead",
   1165         FutureWarning,
   1166     )
   1167     self.optim = OptimizerNames.ADAFACTOR
   1169 if (
   1170     self.framework == "pt"
   1171     and is_torch_available()
-&amp;gt; 1172     and (self.device.type != "cuda")
   1173     and (get_xla_device_type(self.device) != "GPU")
   1174     and (self.fp16 or self.fp16_full_eval)
   1175 ):
   1176     raise ValueError(
   1177         "FP16 Mixed precision training with AMP or APEX (`--fp16`) and FP16 half precision evaluation"
   1178         " (`--fp16_full_eval`) can only be used on CUDA devices."
   1179     )
   1181 if (
   1182     self.framework == "pt"
   1183     and is_torch_available()
   (...)
   1188     and (self.bf16 or self.bf16_full_eval)
   1189 ):
&amp;nbsp;
File /databricks/python/lib/python3.10/site-packages/transformers/training_args.py:1556, in TrainingArguments.device(self)
   1552 """
   1553 The device used by this process.
   1554 """
   1555 requires_backends(self, ["torch"])
-&amp;gt; 1556 return self._setup_devices
&amp;nbsp;
File /databricks/python/lib/python3.10/site-packages/transformers/utils/generic.py:57, in cached_property.__get__(self, obj, objtype)
     55 cached = getattr(obj, attr, None)
     56 if cached is None:
---&amp;gt; 57     cached = self.fget(obj)
     58     setattr(obj, attr, cached)
     59 return cached
&amp;nbsp;
File /databricks/python/lib/python3.10/site-packages/transformers/training_args.py:1541, in TrainingArguments._setup_devices(self)
   1537 else:
   1538     # Here, we'll use torch.distributed.
   1539     # Initializes the distributed backend which will take care of synchronizing nodes/GPUs
   1540     if not torch.distributed.is_initialized():
-&amp;gt; 1541         torch.distributed.init_process_group(backend="nccl", timeout=self.ddp_timeout_delta)
   1542     device = torch.device("cuda", self.local_rank)
   1543     self._n_gpu = 1
&amp;nbsp;
File /databricks/python/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:761, in init_process_group(backend, init_method, timeout, world_size, rank, store, group_name, pg_options)
    757         # Use a PrefixStore to avoid accidental overrides of keys used by
    758         # different systems (e.g. RPC) in case the store is multi-tenant.
    759         store = PrefixStore("default_pg", store)
--&amp;gt; 761     default_pg = _new_process_group_helper(
    762         world_size,
    763         rank,
    764         [],
    765         backend,
    766         store,
    767         pg_options=pg_options,
    768         group_name=group_name,
    769         timeout=timeout,
    770     )
    771     _update_default_pg(default_pg)
    773 _pg_group_ranks[GroupMember.WORLD] = {i: i for i in range(GroupMember.WORLD.size())}  # type: ignore[attr-defined, index]
&amp;nbsp;
File /databricks/python/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:886, in _new_process_group_helper(group_size, group_rank, global_ranks_in_group, backend, store, pg_options, group_name, timeout)
    884 elif backend == Backend.NCCL:
    885     if not is_nccl_available():
--&amp;gt; 886         raise RuntimeError("Distributed package doesn't have NCCL " "built in")
    887     if pg_options is not None:
    888         assert isinstance(
    889             pg_options, ProcessGroupNCCL.Options
    890         ), "Expected pg_options argument to be of type ProcessGroupNCCL.Options"
&amp;nbsp;
RuntimeError: Distributed package doesn't have NCCL built in&lt;/CODE&gt;&lt;/PRE&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;---&lt;/P&gt;&lt;P&gt;I have tried the following fix with no effect.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;```&lt;/P&gt;&lt;P&gt;import os&lt;/P&gt;&lt;P&gt;os.environ["PL_TORCH_DISTRIBUTED_BACKEND"] = "gloo"&lt;/P&gt;&lt;P&gt;```&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;I can not find any other pointers.&lt;/P&gt;&lt;P&gt;Can anyone please give suggestions on what may be going on? &lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;</description>
    <pubDate>Mon, 19 Jun 2023 15:02:58 GMT</pubDate>
    <dc:creator>anastassia_kor1</dc:creator>
    <dc:date>2023-06-19T15:02:58Z</dc:date>
    <item>
      <title>Error "Distributed package doesn't have nccl built in" with Transformers Library.</title>
      <link>https://community.databricks.com/t5/data-engineering/error-quot-distributed-package-doesn-t-have-nccl-built-in-quot/m-p/2843#M115</link>
      <description>&lt;P&gt;I am trying to run a simple training script using HF's transformers library and am running into the error `Distributed package doesn't have nccl built in` error.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Runtime: DBR 13.0 ML - SPark 3.4.0 - Scala 2.12&lt;/P&gt;&lt;P&gt;Driver: i3.xlarge - 4 cores&lt;/P&gt;&lt;P&gt;Note: This is a CPU instance&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;I am trying to fine-tune a transformers model for Sequence Classification - essentially following this tutorial: &lt;A href="https://huggingface.co/docs/transformers/training" alt="https://huggingface.co/docs/transformers/training" target="_blank"&gt;https://huggingface.co/docs/transformers/training&lt;/A&gt;.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;When I try to initialize TrainingArguments (TrainingArguments(output_dir="test_trainer")), I get the error trace&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;PRE&gt;&lt;CODE&gt;---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
File &amp;lt;command-1074749622305054&amp;gt;:3
      1 from transformers import  TrainingArguments
----&amp;gt; 3 TrainingArguments(output_dir="test_trainer")
&amp;nbsp;
File &amp;lt;string&amp;gt;:108, in __init__(self, output_dir, overwrite_output_dir, do_train, do_eval, do_predict, evaluation_strategy, prediction_loss_only, per_device_train_batch_size, per_device_eval_batch_size, per_gpu_train_batch_size, per_gpu_eval_batch_size, gradient_accumulation_steps, eval_accumulation_steps, eval_delay, learning_rate, adam_beta1, adam_beta2, adam_epsilon, max_grad_norm, num_train_epochs, max_steps, lr_scheduler_type, warmup_ratio, warmup_steps, log_level, log_level_replica, log_on_each_node, logging_dir, logging_strategy, logging_first_step, logging_steps, logging_nan_inf_filter, save_strategy, save_steps, save_total_limit, save_on_each_node, no_cuda, use_mps_device, seed, data_seed, jit_mode_eval, use_ipex, bf16, fp16, fp16_opt_level, half_precision_backend, bf16_full_eval, fp16_full_eval, tf32, local_rank, xpu_backend, tpu_num_cores, tpu_metrics_debug, debug, dataloader_drop_last, eval_steps, dataloader_num_workers, past_index, run_name, disable_tqdm, remove_unused_columns, label_names, load_best_model_at_end, metric_for_best_model, greater_is_better, ignore_data_skip, sharded_ddp, fsdp, fsdp_min_num_params, fsdp_transformer_layer_cls_to_wrap, deepspeed, label_smoothing_factor, optim, optim_args, adafactor, group_by_length, length_column_name, report_to, ddp_find_unused_parameters, ddp_bucket_cap_mb, dataloader_pin_memory, skip_memory_metrics, use_legacy_prediction_loop, push_to_hub, resume_from_checkpoint, hub_model_id, hub_strategy, hub_token, hub_private_repo, gradient_checkpointing, include_inputs_for_metrics, fp16_backend, push_to_hub_model_id, push_to_hub_organization, push_to_hub_token, mp_parameters, auto_find_batch_size, full_determinism, torchdynamo, ray_scope, ddp_timeout, torch_compile, torch_compile_backend, torch_compile_mode)
&amp;nbsp;
File /databricks/python/lib/python3.10/site-packages/transformers/training_args.py:1172, in TrainingArguments.__post_init__(self)
   1162     warnings.warn(
   1163         "`--adafactor` is deprecated and will be removed in version 5 of 🤗 Transformers. Use `--optim"
   1164         " adafactor` instead",
   1165         FutureWarning,
   1166     )
   1167     self.optim = OptimizerNames.ADAFACTOR
   1169 if (
   1170     self.framework == "pt"
   1171     and is_torch_available()
-&amp;gt; 1172     and (self.device.type != "cuda")
   1173     and (get_xla_device_type(self.device) != "GPU")
   1174     and (self.fp16 or self.fp16_full_eval)
   1175 ):
   1176     raise ValueError(
   1177         "FP16 Mixed precision training with AMP or APEX (`--fp16`) and FP16 half precision evaluation"
   1178         " (`--fp16_full_eval`) can only be used on CUDA devices."
   1179     )
   1181 if (
   1182     self.framework == "pt"
   1183     and is_torch_available()
   (...)
   1188     and (self.bf16 or self.bf16_full_eval)
   1189 ):
&amp;nbsp;
File /databricks/python/lib/python3.10/site-packages/transformers/training_args.py:1556, in TrainingArguments.device(self)
   1552 """
   1553 The device used by this process.
   1554 """
   1555 requires_backends(self, ["torch"])
-&amp;gt; 1556 return self._setup_devices
&amp;nbsp;
File /databricks/python/lib/python3.10/site-packages/transformers/utils/generic.py:57, in cached_property.__get__(self, obj, objtype)
     55 cached = getattr(obj, attr, None)
     56 if cached is None:
---&amp;gt; 57     cached = self.fget(obj)
     58     setattr(obj, attr, cached)
     59 return cached
&amp;nbsp;
File /databricks/python/lib/python3.10/site-packages/transformers/training_args.py:1541, in TrainingArguments._setup_devices(self)
   1537 else:
   1538     # Here, we'll use torch.distributed.
   1539     # Initializes the distributed backend which will take care of synchronizing nodes/GPUs
   1540     if not torch.distributed.is_initialized():
-&amp;gt; 1541         torch.distributed.init_process_group(backend="nccl", timeout=self.ddp_timeout_delta)
   1542     device = torch.device("cuda", self.local_rank)
   1543     self._n_gpu = 1
&amp;nbsp;
File /databricks/python/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:761, in init_process_group(backend, init_method, timeout, world_size, rank, store, group_name, pg_options)
    757         # Use a PrefixStore to avoid accidental overrides of keys used by
    758         # different systems (e.g. RPC) in case the store is multi-tenant.
    759         store = PrefixStore("default_pg", store)
--&amp;gt; 761     default_pg = _new_process_group_helper(
    762         world_size,
    763         rank,
    764         [],
    765         backend,
    766         store,
    767         pg_options=pg_options,
    768         group_name=group_name,
    769         timeout=timeout,
    770     )
    771     _update_default_pg(default_pg)
    773 _pg_group_ranks[GroupMember.WORLD] = {i: i for i in range(GroupMember.WORLD.size())}  # type: ignore[attr-defined, index]
&amp;nbsp;
File /databricks/python/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:886, in _new_process_group_helper(group_size, group_rank, global_ranks_in_group, backend, store, pg_options, group_name, timeout)
    884 elif backend == Backend.NCCL:
    885     if not is_nccl_available():
--&amp;gt; 886         raise RuntimeError("Distributed package doesn't have NCCL " "built in")
    887     if pg_options is not None:
    888         assert isinstance(
    889             pg_options, ProcessGroupNCCL.Options
    890         ), "Expected pg_options argument to be of type ProcessGroupNCCL.Options"
&amp;nbsp;
RuntimeError: Distributed package doesn't have NCCL built in&lt;/CODE&gt;&lt;/PRE&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;---&lt;/P&gt;&lt;P&gt;I have tried the following fix with no effect.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;```&lt;/P&gt;&lt;P&gt;import os&lt;/P&gt;&lt;P&gt;os.environ["PL_TORCH_DISTRIBUTED_BACKEND"] = "gloo"&lt;/P&gt;&lt;P&gt;```&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;I can not find any other pointers.&lt;/P&gt;&lt;P&gt;Can anyone please give suggestions on what may be going on? &lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;</description>
      <pubDate>Mon, 19 Jun 2023 15:02:58 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/error-quot-distributed-package-doesn-t-have-nccl-built-in-quot/m-p/2843#M115</guid>
      <dc:creator>anastassia_kor1</dc:creator>
      <dc:date>2023-06-19T15:02:58Z</dc:date>
    </item>
    <item>
      <title>Re: Error "Distributed package doesn't have nccl built in" with Transformers Library.</title>
      <link>https://community.databricks.com/t5/data-engineering/error-quot-distributed-package-doesn-t-have-nccl-built-in-quot/m-p/2844#M116</link>
      <description>&lt;P&gt;Hi @Anastassia Kornilova​&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Great to meet you, and thanks for your question! &lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Let's see if your peers in the community have an answer to your question. Thanks.&lt;/P&gt;</description>
      <pubDate>Tue, 20 Jun 2023 04:40:37 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/error-quot-distributed-package-doesn-t-have-nccl-built-in-quot/m-p/2844#M116</guid>
      <dc:creator>Anonymous</dc:creator>
      <dc:date>2023-06-20T04:40:37Z</dc:date>
    </item>
    <item>
      <title>Re: Error "Distributed package doesn't have nccl built in" with Transformers Library.</title>
      <link>https://community.databricks.com/t5/data-engineering/error-quot-distributed-package-doesn-t-have-nccl-built-in-quot/m-p/37811#M26465</link>
      <description>&lt;P&gt;Hi&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/74886"&gt;@anastassia_kor1&lt;/a&gt;,&lt;/P&gt;&lt;P&gt;For CPU-only training, TrainingArguments has a no_cuda flag that should be set.&lt;/P&gt;&lt;P&gt;For transformers==4.26.1 (MLR 13.0) and transformers==4.28.1 (MLR 13.1), there's an additional xpu_backend argument that needs to be set as well. Try using:&lt;/P&gt;&lt;LI-CODE lang="python"&gt;training_args = TrainingArguments(output_dir="outputs", no_cuda=True, xpu_backend="gloo")&lt;/LI-CODE&gt;&lt;P&gt;&amp;nbsp;For transformers==4.29.2 (MLR 13.2), try using:&lt;/P&gt;&lt;LI-CODE lang="python"&gt;training_args = TrainingArguments(output_dir="outputs", no_cuda=True)&lt;/LI-CODE&gt;&lt;P&gt;&amp;nbsp;It may be necessary to restart the cluster in order for this argument to take effect.&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Mon, 17 Jul 2023 21:44:37 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/error-quot-distributed-package-doesn-t-have-nccl-built-in-quot/m-p/37811#M26465</guid>
      <dc:creator>patputnam-db</dc:creator>
      <dc:date>2023-07-17T21:44:37Z</dc:date>
    </item>
  </channel>
</rss>

