<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Distributed SparkXGBRanker training: failed barrier ResultStage in Machine Learning</title>
    <link>https://community.databricks.com/t5/machine-learning/distributed-sparkxgbranker-training-failed-barrier-resultstage/m-p/120569#M4094</link>
    <description>&lt;P&gt;I've also tried upgrading to&amp;nbsp;&lt;SPAN class=""&gt;16.4 LTS ML (includes Apache Spark 3.5.2, Scala 2.12).&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;SPAN class=""&gt;Full error below:&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;SPAN class=""&gt;```&lt;/SPAN&gt;&lt;/P&gt;&lt;DIV&gt;&lt;DIV class=""&gt;&lt;DIV&gt;&lt;DIV class=""&gt;&lt;DIV class=""&gt;&lt;DIV class=""&gt;&lt;DIV class=""&gt;2025-05-29 17:13:53,553 INFO XGBoost-PySpark: _fit Running xgboost-2.0.3 on 76 workers with booster params: {'objective': 'rank:ndcg', 'colsample_bytree': 0.8, 'device': 'cpu', 'gamma': 4, 'grow_policy': 'lossguide', 'max_depth': 8, 'max_leaves': 128, 'min_child_weight': 6, 'alpha': 1, 'eta': 0.5, 'lambda': 3, 'num_round': 600, 'eval_metric': 'ndcg@5', 'nthread': 1} train_call_kwargs_params: {'early_stopping_rounds': 30, 'verbose_eval': False, 'num_boost_round': 100} dmatrix_kwargs: {'nthread': 1, 'missing': nan}&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;DIV class=""&gt;&lt;DIV class=""&gt;&lt;DIV&gt;&lt;DIV class=""&gt;&lt;DIV&gt;&lt;DIV class=""&gt;&lt;SPAN class=""&gt;Py4JJavaError: &lt;/SPAN&gt;An error occurred while calling z:org.apache.spark.api.python.PythonRDD.collectAndServe. : org.apache.spark.SparkException: Job aborted due to stage failure: Could not recover from a failed barrier ResultStage. Most recent failure reason: Stage failed because barrier task ResultTask(14, 3) finished unsuccessfully. org.apache.spark.api.python.PythonException: Traceback (most recent call last): File "/databricks/python/lib/python3.12/site-packages/xgboost/spark/core.py", line 1082, in _train_booster dtrain, dvalid = create_dmatrix_from_partitions( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/databricks/python/lib/python3.12/site-packages/xgboost/spark/data.py", line 312, in create_dmatrix_from_partitions cache_partitions(iterator, append_fn) File "/databricks/python/lib/python3.12/site-packages/xgboost/spark/data.py", line 59, in cache_partitions train = part.loc[~part[alias.valid], :] ~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^ File "/databricks/python/lib/python3.12/site-packages/pandas/core/indexing.py", line 1067, in __getitem__ return self._getitem_tuple(key) ^^^^^^^^^^^^^^^^^^^^^^^^ File "/databricks/python/lib/python3.12/site-packages/pandas/core/indexing.py", line 1256, in _getitem_tuple return self._getitem_tuple_same_dim(tup) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/databricks/python/lib/python3.12/site-packages/pandas/core/indexing.py", line 924, in _getitem_tuple_same_dim retval = getattr(retval, self.name)._getitem_axis(key, axis=i) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/databricks/python/lib/python3.12/site-packages/pandas/core/indexing.py", line 1301, in _getitem_axis return self._getitem_iterable(key, axis=axis) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/databricks/python/lib/python3.12/site-packages/pandas/core/indexing.py", line 1239, in _getitem_iterable keyarr, indexer = self._get_listlike_indexer(key, axis) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/databricks/python/lib/python3.12/site-packages/pandas/core/indexing.py", line 1432, in _get_listlike_indexer keyarr, indexer = ax._get_indexer_strict(key, axis_name) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/databricks/python/lib/python3.12/site-packages/pandas/core/indexes/base.py", line 6070, in _get_indexer_strict self._raise_if_missing(keyarr, indexer, axis_name) File "/databricks/python/lib/python3.12/site-packages/pandas/core/indexes/base.py", line 6130, in _raise_if_missing raise KeyError(f"None of [{key}] are in the [{axis_name}]") KeyError: "None of [Int64Index([-1, -1, -1, -1, -1, -1, -1, -1, -1, -1,\n ...\n -1, -1, -1, -1, -1, -1, -1, -1, -1, -1],\n dtype='int64', length=10000)] are in the [index]" at org.apache.spark.api.python.BasePythonRunner$ReaderIterator.handlePythonException(PythonRunner.scala:851) at org.apache.spark.sql.execution.python.PythonArrowOutput$$anon$1.read(PythonArrowOutput.scala:117) at org.apache.spark.api.python.BasePythonRunner$ReaderIterator.hasNext(PythonRunner.scala:800) at org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:37) at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:491) at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:460) at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:460) at org.apache.spark.api.python.SerDeUtil$AutoBatchedPickler.hasNext(SerDeUtil.scala:90) at org.apache.spark.api.python.PythonRDD$.writeNextElementToStream(PythonRDD.scala:504) at org.apache.spark.api.python.PythonRunner$$anon$2.writeNextInputToStream(PythonRunner.scala:1283) at org.apache.spark.api.python.BasePythonRunner$ReaderInputStream.writeAdditionalInputToPythonWorker(PythonRunner.scala:1191) at org.apache.spark.api.python.BasePythonRunner$ReaderInputStream.read(PythonRunner.scala:1105) at java.base/java.io.BufferedInputStream.fill(BufferedInputStream.java:244) at java.base/java.io.BufferedInputStream.read(BufferedInputStream.java:263) at java.base/java.io.DataInputStream.readInt(DataInputStream.java:381) at org.apache.spark.api.python.PythonRunner$$anon$3.read(PythonRunner.scala:1310) at org.apache.spark.api.python.PythonRunner$$anon$3.read(PythonRunner.scala:1302) at org.apache.spark.api.python.BasePythonRunner$ReaderIterator.hasNext(PythonRunner.scala:800) at org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:37) at scala.collection.Iterator.foreach(Iterator.scala:943) at scala.collection.Iterator.foreach$(Iterator.scala:943) at org.apache.spark.InterruptibleIterator.foreach(InterruptibleIterator.scala:28) at scala.collection.generic.Growable.$plus$plus$eq(Growable.scala:62) at scala.collection.generic.Growable.$plus$plus$eq$(Growable.scala:53) at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:105) at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:49) at scala.collection.TraversableOnce.to(TraversableOnce.scala:366) at scala.collection.TraversableOnce.to$(TraversableOnce.scala:364) at org.apache.spark.InterruptibleIterator.to(InterruptibleIterator.scala:28) at scala.collection.TraversableOnce.toBuffer(TraversableOnce.scala:358) at scala.collection.TraversableOnce.toBuffer$(TraversableOnce.scala:358) at org.apache.spark.InterruptibleIterator.toBuffer(InterruptibleIterator.scala:28) at scala.collection.TraversableOnce.toArray(TraversableOnce.scala:345) at scala.collection.TraversableOnce.toArray$(TraversableOnce.scala:339) at org.apache.spark.InterruptibleIterator.toArray(InterruptibleIterator.scala:28) at org.apache.spark.rdd.RDD.$anonfun$collect$2(RDD.scala:1113) at org.apache.spark.scheduler.ResultTask.$anonfun$runTask$2(ResultTask.scala:76) at com.databricks.spark.util.ExecutorFrameProfiler$.record(ExecutorFrameProfiler.scala:110) at org.apache.spark.scheduler.ResultTask.$anonfun$runTask$1(ResultTask.scala:76) at com.databricks.spark.util.ExecutorFrameProfiler$.record(ExecutorFrameProfiler.scala:110) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:62) at org.apache.spark.TaskContext.runTaskWithListeners(TaskContext.scala:227) at org.apache.spark.scheduler.Task.doRunTask(Task.scala:204) at org.apache.spark.scheduler.Task.$anonfun$run$5(Task.scala:166) at com.databricks.unity.UCSEphemeralState$Handle.runWith(UCSEphemeralState.scala:51) at com.databricks.unity.HandleImpl.runWith(UCSHandle.scala:104) at com.databricks.unity.HandleImpl.$anonfun$runWithAndClose$1(UCSHandle.scala:109) at scala.util.Using$.resource(Using.scala:269) at com.databricks.unity.HandleImpl.runWithAndClose(UCSHandle.scala:108) at org.apache.spark.scheduler.Task.$anonfun$run$1(Task.scala:160) at com.databricks.spark.util.ExecutorFrameProfiler$.record(ExecutorFrameProfiler.scala:110) at org.apache.spark.scheduler.Task.run(Task.scala:105) at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$11(Executor.scala:1228) at org.apache.spark.util.SparkErrorUtils.tryWithSafeFinally(SparkErrorUtils.scala:80) at org.apache.spark.util.SparkErrorUtils.tryWithSafeFinally$(SparkErrorUtils.scala:77) at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:111) at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:1232) at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23) at com.databricks.spark.util.ExecutorFrameProfiler$.record(ExecutorFrameProfiler.scala:110) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:1088) at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136) at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) at java.base/java.lang.Thread.run(Thread.java:840) at org.apache.spark.scheduler.DAGScheduler.$anonfun$failJobAndIndependentStages$1(DAGScheduler.scala:4472) at scala.Option.getOrElse(Option.scala:189) at org.apache.spark.scheduler.DAGScheduler.failJobAndIndependentStages(DAGScheduler.scala:4470) at org.apache.spark.scheduler.DAGScheduler.$anonfun$abortStage$2(DAGScheduler.scala:4382) at org.apache.spark.scheduler.DAGScheduler.$anonfun$abortStage$2$adapted(DAGScheduler.scala:4369) at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62) at scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55) at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49) at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:4369) at org.apache.spark.scheduler.DAGScheduler.handleTaskCompletion(DAGScheduler.scala:3737) at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:4730) at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.liftedTree1$1(DAGScheduler.scala:4634) at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:4633) at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:4619) at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:55) at org.apache.spark.scheduler.DAGScheduler.$anonfun$runJob$1(DAGScheduler.scala:1512) at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23) at com.databricks.spark.util.FrameProfiler$.record(FrameProfiler.scala:94) at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:1498) at org.apache.spark.SparkContext.runJobInternal(SparkContext.scala:3254) at org.apache.spark.rdd.RDD.$anonfun$collect$1(RDD.scala:1111) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:165) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:125) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112) at org.apache.spark.rdd.RDD.withScope(RDD.scala:461) at org.apache.spark.rdd.RDD.collect(RDD.scala:1109) at org.apache.spark.api.python.PythonRDD$.collectAndServe(PythonRDD.scala:319) at org.apache.spark.api.python.PythonRDD.collectAndServe(PythonRDD.scala) at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77) at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.base/java.lang.reflect.Method.invoke(Method.java:569) at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244) at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:397) at py4j.Gateway.invoke(Gateway.java:306) at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132) at py4j.commands.CallCommand.execute(CallCommand.java:79) at py4j.ClientServerConnection.waitForCommands(ClientServerConnection.java:197) at py4j.ClientServerConnection.run(ClientServerConnection.java:117) at java.base/java.lang.Thread.run(Thread.java:840)&lt;/DIV&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;DIV class=""&gt;File &lt;SPAN class=""&gt;&lt;A target="_blank"&gt;&amp;lt;command-5246778553375285&amp;gt;&lt;/A&gt;, line 1&lt;/SPAN&gt; &lt;SPAN class=""&gt;----&amp;gt; 1&lt;/SPAN&gt; pipeline_model &lt;SPAN&gt;=&lt;/SPAN&gt; pipeline&lt;SPAN&gt;.&lt;/SPAN&gt;fit(exploded) &lt;SPAN&gt;2&lt;/SPAN&gt; ranker_model &lt;SPAN&gt;=&lt;/SPAN&gt; pipeline_model&lt;SPAN&gt;.&lt;/SPAN&gt;stages[&lt;SPAN&gt;-&lt;/SPAN&gt;&lt;SPAN&gt;1&lt;/SPAN&gt;] &lt;SPAN&gt;4&lt;/SPAN&gt; native_booster &lt;SPAN&gt;=&lt;/SPAN&gt; ranker_model&lt;SPAN&gt;.&lt;/SPAN&gt;get_booster()&lt;/DIV&gt;&lt;DIV class=""&gt;&lt;HR /&gt;&lt;/DIV&gt;&lt;DIV class=""&gt;File &lt;SPAN class=""&gt;/databricks/python_shell/lib/dbruntime/MLWorkloadsInstrumentation/_pyspark.py:30&lt;/SPAN&gt;, in &lt;SPAN class=""&gt;_create_patch_function.&amp;lt;locals&amp;gt;.patched_method&lt;/SPAN&gt;&lt;SPAN class=""&gt;(self, *args, **kwargs)&lt;/SPAN&gt; &lt;SPAN&gt;28&lt;/SPAN&gt; call_succeeded &lt;SPAN&gt;=&lt;/SPAN&gt; &lt;SPAN class=""&gt;False&lt;/SPAN&gt; &lt;SPAN&gt;29&lt;/SPAN&gt; &lt;SPAN class=""&gt;try&lt;/SPAN&gt;: &lt;SPAN class=""&gt;---&amp;gt; 30&lt;/SPAN&gt; result &lt;SPAN&gt;=&lt;/SPAN&gt; original_method(&lt;SPAN&gt;self&lt;/SPAN&gt;, &lt;SPAN&gt;*&lt;/SPAN&gt;args, &lt;SPAN&gt;*&lt;/SPAN&gt;&lt;SPAN&gt;*&lt;/SPAN&gt;kwargs) &lt;SPAN&gt;31&lt;/SPAN&gt; call_succeeded &lt;SPAN&gt;=&lt;/SPAN&gt; &lt;SPAN class=""&gt;True&lt;/SPAN&gt; &lt;SPAN&gt;32&lt;/SPAN&gt; &lt;SPAN class=""&gt;return&lt;/SPAN&gt; result&lt;/DIV&gt;&lt;DIV class=""&gt;File &lt;SPAN class=""&gt;/databricks/python/lib/python3.12/site-packages/mlflow/utils/autologging_utils/safety.py:483&lt;/SPAN&gt;, in &lt;SPAN class=""&gt;safe_patch.&amp;lt;locals&amp;gt;.safe_patch_function&lt;/SPAN&gt;&lt;SPAN class=""&gt;(*args, **kwargs)&lt;/SPAN&gt; &lt;SPAN&gt;479&lt;/SPAN&gt; call_original &lt;SPAN&gt;=&lt;/SPAN&gt; update_wrapper_extended(call_original, original) &lt;SPAN&gt;481&lt;/SPAN&gt; event_logger&lt;SPAN&gt;.&lt;/SPAN&gt;log_patch_function_start(args, kwargs) &lt;SPAN class=""&gt;--&amp;gt; 483&lt;/SPAN&gt; patch_function(call_original, &lt;SPAN&gt;*&lt;/SPAN&gt;args, &lt;SPAN&gt;*&lt;/SPAN&gt;&lt;SPAN&gt;*&lt;/SPAN&gt;kwargs) &lt;SPAN&gt;485&lt;/SPAN&gt; session&lt;SPAN&gt;.&lt;/SPAN&gt;state &lt;SPAN&gt;=&lt;/SPAN&gt; &lt;SPAN&gt;"&lt;/SPAN&gt;&lt;SPAN&gt;succeeded&lt;/SPAN&gt;&lt;SPAN&gt;"&lt;/SPAN&gt; &lt;SPAN&gt;486&lt;/SPAN&gt; event_logger&lt;SPAN&gt;.&lt;/SPAN&gt;log_patch_function_success(args, kwargs)&lt;/DIV&gt;&lt;DIV class=""&gt;File &lt;SPAN class=""&gt;/databricks/python/lib/python3.12/site-packages/mlflow/utils/autologging_utils/safety.py:182&lt;/SPAN&gt;, in &lt;SPAN class=""&gt;with_managed_run.&amp;lt;locals&amp;gt;.patch_with_managed_run&lt;/SPAN&gt;&lt;SPAN class=""&gt;(original, *args, **kwargs)&lt;/SPAN&gt; &lt;SPAN&gt;179&lt;/SPAN&gt; managed_run &lt;SPAN&gt;=&lt;/SPAN&gt; create_managed_run() &lt;SPAN&gt;181&lt;/SPAN&gt; &lt;SPAN class=""&gt;try&lt;/SPAN&gt;: &lt;SPAN class=""&gt;--&amp;gt; 182&lt;/SPAN&gt; result &lt;SPAN&gt;=&lt;/SPAN&gt; patch_function(original, &lt;SPAN&gt;*&lt;/SPAN&gt;args, &lt;SPAN&gt;*&lt;/SPAN&gt;&lt;SPAN&gt;*&lt;/SPAN&gt;kwargs) &lt;SPAN&gt;183&lt;/SPAN&gt; &lt;SPAN class=""&gt;except&lt;/SPAN&gt; (&lt;SPAN class=""&gt;Exception&lt;/SPAN&gt;, &lt;SPAN class=""&gt;KeyboardInterrupt&lt;/SPAN&gt;&lt;span class="lia-unicode-emoji" title=":disappointed_face:"&gt;😞&lt;/span&gt; &lt;SPAN&gt;184&lt;/SPAN&gt; &lt;SPAN&gt;# In addition to standard Python exceptions, handle keyboard interrupts to ensure&lt;/SPAN&gt; &lt;SPAN&gt;185&lt;/SPAN&gt; &lt;SPAN&gt;# that runs are terminated if a user prematurely interrupts training execution&lt;/SPAN&gt; &lt;SPAN&gt;186&lt;/SPAN&gt; &lt;SPAN&gt;# (e.g. via sigint / ctrl-c)&lt;/SPAN&gt; &lt;SPAN&gt;187&lt;/SPAN&gt; &lt;SPAN class=""&gt;if&lt;/SPAN&gt; managed_run:&lt;/DIV&gt;&lt;DIV class=""&gt;File &lt;SPAN class=""&gt;/databricks/python/lib/python3.12/site-packages/mlflow/pyspark/ml/__init__.py:1172&lt;/SPAN&gt;, in &lt;SPAN class=""&gt;autolog.&amp;lt;locals&amp;gt;.patched_fit&lt;/SPAN&gt;&lt;SPAN class=""&gt;(original, self, *args, **kwargs)&lt;/SPAN&gt; &lt;SPAN&gt;1170&lt;/SPAN&gt; &lt;SPAN class=""&gt;if&lt;/SPAN&gt; t&lt;SPAN&gt;.&lt;/SPAN&gt;should_log(): &lt;SPAN&gt;1171&lt;/SPAN&gt; &lt;SPAN class=""&gt;with&lt;/SPAN&gt; _AUTOLOGGING_METRICS_MANAGER&lt;SPAN&gt;.&lt;/SPAN&gt;disable_log_post_training_metrics(): &lt;SPAN class=""&gt;-&amp;gt; 1172&lt;/SPAN&gt; fit_result &lt;SPAN&gt;=&lt;/SPAN&gt; fit_mlflow(original, &lt;SPAN&gt;self&lt;/SPAN&gt;, &lt;SPAN&gt;*&lt;/SPAN&gt;args, &lt;SPAN&gt;*&lt;/SPAN&gt;&lt;SPAN&gt;*&lt;/SPAN&gt;kwargs) &lt;SPAN&gt;1173&lt;/SPAN&gt; &lt;SPAN&gt;# In some cases the `fit_result` may be an iterator of spark models.&lt;/SPAN&gt; &lt;SPAN&gt;1174&lt;/SPAN&gt; &lt;SPAN class=""&gt;if&lt;/SPAN&gt; should_log_post_training_metrics &lt;SPAN class=""&gt;and&lt;/SPAN&gt; &lt;SPAN&gt;isinstance&lt;/SPAN&gt;(fit_result, Model):&lt;/DIV&gt;&lt;DIV class=""&gt;File &lt;SPAN class=""&gt;/databricks/python/lib/python3.12/site-packages/mlflow/pyspark/ml/__init__.py:1158&lt;/SPAN&gt;, in &lt;SPAN class=""&gt;autolog.&amp;lt;locals&amp;gt;.fit_mlflow&lt;/SPAN&gt;&lt;SPAN class=""&gt;(original, self, *args, **kwargs)&lt;/SPAN&gt; &lt;SPAN&gt;1156&lt;/SPAN&gt; input_training_df &lt;SPAN&gt;=&lt;/SPAN&gt; args[&lt;SPAN&gt;0&lt;/SPAN&gt;]&lt;SPAN&gt;.&lt;/SPAN&gt;persist(StorageLevel&lt;SPAN&gt;.&lt;/SPAN&gt;MEMORY_AND_DISK) &lt;SPAN&gt;1157&lt;/SPAN&gt; _log_pretraining_metadata(estimator, params, input_training_df) &lt;SPAN class=""&gt;-&amp;gt; 1158&lt;/SPAN&gt; spark_model &lt;SPAN&gt;=&lt;/SPAN&gt; original(&lt;SPAN&gt;self&lt;/SPAN&gt;, &lt;SPAN&gt;*&lt;/SPAN&gt;args, &lt;SPAN&gt;*&lt;/SPAN&gt;&lt;SPAN&gt;*&lt;/SPAN&gt;kwargs) &lt;SPAN&gt;1159&lt;/SPAN&gt; _log_posttraining_metadata(estimator, spark_model, params, input_training_df) &lt;SPAN&gt;1160&lt;/SPAN&gt; input_training_df&lt;SPAN&gt;.&lt;/SPAN&gt;unpersist()&lt;/DIV&gt;&lt;DIV class=""&gt;File &lt;SPAN class=""&gt;/databricks/python/lib/python3.12/site-packages/mlflow/utils/autologging_utils/safety.py:474&lt;/SPAN&gt;, in &lt;SPAN class=""&gt;safe_patch.&amp;lt;locals&amp;gt;.safe_patch_function.&amp;lt;locals&amp;gt;.call_original&lt;/SPAN&gt;&lt;SPAN class=""&gt;(*og_args, **og_kwargs)&lt;/SPAN&gt; &lt;SPAN&gt;471&lt;/SPAN&gt; original_result &lt;SPAN&gt;=&lt;/SPAN&gt; original(&lt;SPAN&gt;*&lt;/SPAN&gt;_og_args, &lt;SPAN&gt;*&lt;/SPAN&gt;&lt;SPAN&gt;*&lt;/SPAN&gt;_og_kwargs) &lt;SPAN&gt;472&lt;/SPAN&gt; &lt;SPAN class=""&gt;return&lt;/SPAN&gt; original_result &lt;SPAN class=""&gt;--&amp;gt; 474&lt;/SPAN&gt; &lt;SPAN class=""&gt;return&lt;/SPAN&gt; call_original_fn_with_event_logging(_original_fn, og_args, og_kwargs)&lt;/DIV&gt;&lt;DIV class=""&gt;File &lt;SPAN class=""&gt;/databricks/python/lib/python3.12/site-packages/mlflow/utils/autologging_utils/safety.py:425&lt;/SPAN&gt;, in &lt;SPAN class=""&gt;safe_patch.&amp;lt;locals&amp;gt;.safe_patch_function.&amp;lt;locals&amp;gt;.call_original_fn_with_event_logging&lt;/SPAN&gt;&lt;SPAN class=""&gt;(original_fn, og_args, og_kwargs)&lt;/SPAN&gt; &lt;SPAN&gt;422&lt;/SPAN&gt; &lt;SPAN class=""&gt;try&lt;/SPAN&gt;: &lt;SPAN&gt;423&lt;/SPAN&gt; event_logger&lt;SPAN&gt;.&lt;/SPAN&gt;log_original_function_start(og_args, og_kwargs) &lt;SPAN class=""&gt;--&amp;gt; 425&lt;/SPAN&gt; original_fn_result &lt;SPAN&gt;=&lt;/SPAN&gt; original_fn(&lt;SPAN&gt;*&lt;/SPAN&gt;og_args, &lt;SPAN&gt;*&lt;/SPAN&gt;&lt;SPAN&gt;*&lt;/SPAN&gt;og_kwargs) &lt;SPAN&gt;427&lt;/SPAN&gt; event_logger&lt;SPAN&gt;.&lt;/SPAN&gt;log_original_function_success(og_args, og_kwargs) &lt;SPAN&gt;428&lt;/SPAN&gt; &lt;SPAN class=""&gt;return&lt;/SPAN&gt; original_fn_result&lt;/DIV&gt;&lt;DIV class=""&gt;File &lt;SPAN class=""&gt;/databricks/python/lib/python3.12/site-packages/mlflow/utils/autologging_utils/safety.py:471&lt;/SPAN&gt;, in &lt;SPAN class=""&gt;safe_patch.&amp;lt;locals&amp;gt;.safe_patch_function.&amp;lt;locals&amp;gt;.call_original.&amp;lt;locals&amp;gt;._original_fn&lt;/SPAN&gt;&lt;SPAN class=""&gt;(*_og_args, **_og_kwargs)&lt;/SPAN&gt; &lt;SPAN&gt;463&lt;/SPAN&gt; &lt;SPAN&gt;# Show all non-MLflow warnings as normal (i.e. not as event logs)&lt;/SPAN&gt; &lt;SPAN&gt;464&lt;/SPAN&gt; &lt;SPAN&gt;# during original function execution, even if silent mode is enabled&lt;/SPAN&gt; &lt;SPAN&gt;465&lt;/SPAN&gt; &lt;SPAN&gt;# (`silent=True`), since these warnings originate from the ML framework&lt;/SPAN&gt; &lt;SPAN&gt;466&lt;/SPAN&gt; &lt;SPAN&gt;# or one of its dependencies and are likely relevant to the caller&lt;/SPAN&gt; &lt;SPAN&gt;467&lt;/SPAN&gt; &lt;SPAN class=""&gt;with&lt;/SPAN&gt; NonMlflowWarningsBehaviorForCurrentThread( &lt;SPAN&gt;468&lt;/SPAN&gt; disable_warnings&lt;SPAN&gt;=&lt;/SPAN&gt;&lt;SPAN class=""&gt;False&lt;/SPAN&gt;, &lt;SPAN&gt;469&lt;/SPAN&gt; reroute_warnings&lt;SPAN&gt;=&lt;/SPAN&gt;&lt;SPAN class=""&gt;False&lt;/SPAN&gt;, &lt;SPAN&gt;470&lt;/SPAN&gt; &lt;span class="lia-unicode-emoji" title=":disappointed_face:"&gt;😞&lt;/span&gt; &lt;SPAN class=""&gt;--&amp;gt; 471&lt;/SPAN&gt; original_result &lt;SPAN&gt;=&lt;/SPAN&gt; original(&lt;SPAN&gt;*&lt;/SPAN&gt;_og_args, &lt;SPAN&gt;*&lt;/SPAN&gt;&lt;SPAN&gt;*&lt;/SPAN&gt;_og_kwargs) &lt;SPAN&gt;472&lt;/SPAN&gt; &lt;SPAN class=""&gt;return&lt;/SPAN&gt; original_result&lt;/DIV&gt;&lt;DIV class=""&gt;File &lt;SPAN class=""&gt;/databricks/spark/python/pyspark/ml/base.py:203&lt;/SPAN&gt;, in &lt;SPAN class=""&gt;Estimator.fit&lt;/SPAN&gt;&lt;SPAN class=""&gt;(self, dataset, params)&lt;/SPAN&gt; &lt;SPAN&gt;201&lt;/SPAN&gt; &lt;SPAN class=""&gt;return&lt;/SPAN&gt; &lt;SPAN&gt;self&lt;/SPAN&gt;&lt;SPAN&gt;.&lt;/SPAN&gt;copy(params)&lt;SPAN&gt;.&lt;/SPAN&gt;_fit(dataset) &lt;SPAN&gt;202&lt;/SPAN&gt; &lt;SPAN class=""&gt;else&lt;/SPAN&gt;: &lt;SPAN class=""&gt;--&amp;gt; 203&lt;/SPAN&gt; &lt;SPAN class=""&gt;return&lt;/SPAN&gt; &lt;SPAN&gt;self&lt;/SPAN&gt;&lt;SPAN&gt;.&lt;/SPAN&gt;_fit(dataset) &lt;SPAN&gt;204&lt;/SPAN&gt; &lt;SPAN class=""&gt;else&lt;/SPAN&gt;: &lt;SPAN&gt;205&lt;/SPAN&gt; &lt;SPAN class=""&gt;raise&lt;/SPAN&gt; &lt;SPAN class=""&gt;TypeError&lt;/SPAN&gt;( &lt;SPAN&gt;206&lt;/SPAN&gt; &lt;SPAN&gt;"&lt;/SPAN&gt;&lt;SPAN&gt;Params must be either a param map or a list/tuple of param maps, &lt;/SPAN&gt;&lt;SPAN&gt;"&lt;/SPAN&gt; &lt;SPAN&gt;207&lt;/SPAN&gt; &lt;SPAN&gt;"&lt;/SPAN&gt;&lt;SPAN&gt;but got &lt;/SPAN&gt;&lt;SPAN class=""&gt;%s&lt;/SPAN&gt;&lt;SPAN&gt;.&lt;/SPAN&gt;&lt;SPAN&gt;"&lt;/SPAN&gt; &lt;SPAN&gt;%&lt;/SPAN&gt; &lt;SPAN&gt;type&lt;/SPAN&gt;(params) &lt;SPAN&gt;208&lt;/SPAN&gt; )&lt;/DIV&gt;&lt;DIV class=""&gt;File &lt;SPAN class=""&gt;/databricks/spark/python/pyspark/ml/pipeline.py:136&lt;/SPAN&gt;, in &lt;SPAN class=""&gt;Pipeline._fit&lt;/SPAN&gt;&lt;SPAN class=""&gt;(self, dataset)&lt;/SPAN&gt; &lt;SPAN&gt;134&lt;/SPAN&gt; dataset &lt;SPAN&gt;=&lt;/SPAN&gt; stage&lt;SPAN&gt;.&lt;/SPAN&gt;transform(dataset) &lt;SPAN&gt;135&lt;/SPAN&gt; &lt;SPAN class=""&gt;else&lt;/SPAN&gt;: &lt;SPAN&gt;# must be an Estimator&lt;/SPAN&gt; &lt;SPAN class=""&gt;--&amp;gt; 136&lt;/SPAN&gt; model &lt;SPAN&gt;=&lt;/SPAN&gt; stage&lt;SPAN&gt;.&lt;/SPAN&gt;fit(dataset) &lt;SPAN&gt;137&lt;/SPAN&gt; transformers&lt;SPAN&gt;.&lt;/SPAN&gt;append(model) &lt;SPAN&gt;138&lt;/SPAN&gt; &lt;SPAN class=""&gt;if&lt;/SPAN&gt; i &lt;SPAN&gt;&amp;lt;&lt;/SPAN&gt; indexOfLastEstimator:&lt;/DIV&gt;&lt;DIV class=""&gt;File &lt;SPAN class=""&gt;/databricks/python_shell/lib/dbruntime/MLWorkloadsInstrumentation/_pyspark.py:30&lt;/SPAN&gt;, in &lt;SPAN class=""&gt;_create_patch_function.&amp;lt;locals&amp;gt;.patched_method&lt;/SPAN&gt;&lt;SPAN class=""&gt;(self, *args, **kwargs)&lt;/SPAN&gt; &lt;SPAN&gt;28&lt;/SPAN&gt; call_succeeded &lt;SPAN&gt;=&lt;/SPAN&gt; &lt;SPAN class=""&gt;False&lt;/SPAN&gt; &lt;SPAN&gt;29&lt;/SPAN&gt; &lt;SPAN class=""&gt;try&lt;/SPAN&gt;: &lt;SPAN class=""&gt;---&amp;gt; 30&lt;/SPAN&gt; result &lt;SPAN&gt;=&lt;/SPAN&gt; original_method(&lt;SPAN&gt;self&lt;/SPAN&gt;, &lt;SPAN&gt;*&lt;/SPAN&gt;args, &lt;SPAN&gt;*&lt;/SPAN&gt;&lt;SPAN&gt;*&lt;/SPAN&gt;kwargs) &lt;SPAN&gt;31&lt;/SPAN&gt; call_succeeded &lt;SPAN&gt;=&lt;/SPAN&gt; &lt;SPAN class=""&gt;True&lt;/SPAN&gt; &lt;SPAN&gt;32&lt;/SPAN&gt; &lt;SPAN class=""&gt;return&lt;/SPAN&gt; result&lt;/DIV&gt;&lt;DIV class=""&gt;File &lt;SPAN class=""&gt;/databricks/python/lib/python3.12/site-packages/mlflow/utils/autologging_utils/safety.py:483&lt;/SPAN&gt;, in &lt;SPAN class=""&gt;safe_patch.&amp;lt;locals&amp;gt;.safe_patch_function&lt;/SPAN&gt;&lt;SPAN class=""&gt;(*args, **kwargs)&lt;/SPAN&gt; &lt;SPAN&gt;479&lt;/SPAN&gt; call_original &lt;SPAN&gt;=&lt;/SPAN&gt; update_wrapper_extended(call_original, original) &lt;SPAN&gt;481&lt;/SPAN&gt; event_logger&lt;SPAN&gt;.&lt;/SPAN&gt;log_patch_function_start(args, kwargs) &lt;SPAN class=""&gt;--&amp;gt; 483&lt;/SPAN&gt; patch_function(call_original, &lt;SPAN&gt;*&lt;/SPAN&gt;args, &lt;SPAN&gt;*&lt;/SPAN&gt;&lt;SPAN&gt;*&lt;/SPAN&gt;kwargs) &lt;SPAN&gt;485&lt;/SPAN&gt; session&lt;SPAN&gt;.&lt;/SPAN&gt;state &lt;SPAN&gt;=&lt;/SPAN&gt; &lt;SPAN&gt;"&lt;/SPAN&gt;&lt;SPAN&gt;succeeded&lt;/SPAN&gt;&lt;SPAN&gt;"&lt;/SPAN&gt; &lt;SPAN&gt;486&lt;/SPAN&gt; event_logger&lt;SPAN&gt;.&lt;/SPAN&gt;log_patch_function_success(args, kwargs)&lt;/DIV&gt;&lt;DIV class=""&gt;File &lt;SPAN class=""&gt;/databricks/python/lib/python3.12/site-packages/mlflow/utils/autologging_utils/safety.py:182&lt;/SPAN&gt;, in &lt;SPAN class=""&gt;with_managed_run.&amp;lt;locals&amp;gt;.patch_with_managed_run&lt;/SPAN&gt;&lt;SPAN class=""&gt;(original, *args, **kwargs)&lt;/SPAN&gt; &lt;SPAN&gt;179&lt;/SPAN&gt; managed_run &lt;SPAN&gt;=&lt;/SPAN&gt; create_managed_run() &lt;SPAN&gt;181&lt;/SPAN&gt; &lt;SPAN class=""&gt;try&lt;/SPAN&gt;: &lt;SPAN class=""&gt;--&amp;gt; 182&lt;/SPAN&gt; result &lt;SPAN&gt;=&lt;/SPAN&gt; patch_function(original, &lt;SPAN&gt;*&lt;/SPAN&gt;args, &lt;SPAN&gt;*&lt;/SPAN&gt;&lt;SPAN&gt;*&lt;/SPAN&gt;kwargs) &lt;SPAN&gt;183&lt;/SPAN&gt; &lt;SPAN class=""&gt;except&lt;/SPAN&gt; (&lt;SPAN class=""&gt;Exception&lt;/SPAN&gt;, &lt;SPAN class=""&gt;KeyboardInterrupt&lt;/SPAN&gt;&lt;span class="lia-unicode-emoji" title=":disappointed_face:"&gt;😞&lt;/span&gt; &lt;SPAN&gt;184&lt;/SPAN&gt; &lt;SPAN&gt;# In addition to standard Python exceptions, handle keyboard interrupts to ensure&lt;/SPAN&gt; &lt;SPAN&gt;185&lt;/SPAN&gt; &lt;SPAN&gt;# that runs are terminated if a user prematurely interrupts training execution&lt;/SPAN&gt; &lt;SPAN&gt;186&lt;/SPAN&gt; &lt;SPAN&gt;# (e.g. via sigint / ctrl-c)&lt;/SPAN&gt; &lt;SPAN&gt;187&lt;/SPAN&gt; &lt;SPAN class=""&gt;if&lt;/SPAN&gt; managed_run:&lt;/DIV&gt;&lt;DIV class=""&gt;File &lt;SPAN class=""&gt;/databricks/python/lib/python3.12/site-packages/mlflow/pyspark/ml/__init__.py:1180&lt;/SPAN&gt;, in &lt;SPAN class=""&gt;autolog.&amp;lt;locals&amp;gt;.patched_fit&lt;/SPAN&gt;&lt;SPAN class=""&gt;(original, self, *args, **kwargs)&lt;/SPAN&gt; &lt;SPAN&gt;1178&lt;/SPAN&gt; &lt;SPAN class=""&gt;return&lt;/SPAN&gt; fit_result &lt;SPAN&gt;1179&lt;/SPAN&gt; &lt;SPAN class=""&gt;else&lt;/SPAN&gt;: &lt;SPAN class=""&gt;-&amp;gt; 1180&lt;/SPAN&gt; &lt;SPAN class=""&gt;return&lt;/SPAN&gt; original(&lt;SPAN&gt;self&lt;/SPAN&gt;, &lt;SPAN&gt;*&lt;/SPAN&gt;args, &lt;SPAN&gt;*&lt;/SPAN&gt;&lt;SPAN&gt;*&lt;/SPAN&gt;kwargs)&lt;/DIV&gt;&lt;DIV class=""&gt;File &lt;SPAN class=""&gt;/databricks/python/lib/python3.12/site-packages/mlflow/utils/autologging_utils/safety.py:474&lt;/SPAN&gt;, in &lt;SPAN class=""&gt;safe_patch.&amp;lt;locals&amp;gt;.safe_patch_function.&amp;lt;locals&amp;gt;.call_original&lt;/SPAN&gt;&lt;SPAN class=""&gt;(*og_args, **og_kwargs)&lt;/SPAN&gt; &lt;SPAN&gt;471&lt;/SPAN&gt; original_result &lt;SPAN&gt;=&lt;/SPAN&gt; original(&lt;SPAN&gt;*&lt;/SPAN&gt;_og_args, &lt;SPAN&gt;*&lt;/SPAN&gt;&lt;SPAN&gt;*&lt;/SPAN&gt;_og_kwargs) &lt;SPAN&gt;472&lt;/SPAN&gt; &lt;SPAN class=""&gt;return&lt;/SPAN&gt; original_result &lt;SPAN class=""&gt;--&amp;gt; 474&lt;/SPAN&gt; &lt;SPAN class=""&gt;return&lt;/SPAN&gt; call_original_fn_with_event_logging(_original_fn, og_args, og_kwargs)&lt;/DIV&gt;&lt;DIV class=""&gt;File &lt;SPAN class=""&gt;/databricks/python/lib/python3.12/site-packages/mlflow/utils/autologging_utils/safety.py:425&lt;/SPAN&gt;, in &lt;SPAN class=""&gt;safe_patch.&amp;lt;locals&amp;gt;.safe_patch_function.&amp;lt;locals&amp;gt;.call_original_fn_with_event_logging&lt;/SPAN&gt;&lt;SPAN class=""&gt;(original_fn, og_args, og_kwargs)&lt;/SPAN&gt; &lt;SPAN&gt;422&lt;/SPAN&gt; &lt;SPAN class=""&gt;try&lt;/SPAN&gt;: &lt;SPAN&gt;423&lt;/SPAN&gt; event_logger&lt;SPAN&gt;.&lt;/SPAN&gt;log_original_function_start(og_args, og_kwargs) &lt;SPAN class=""&gt;--&amp;gt; 425&lt;/SPAN&gt; original_fn_result &lt;SPAN&gt;=&lt;/SPAN&gt; original_fn(&lt;SPAN&gt;*&lt;/SPAN&gt;og_args, &lt;SPAN&gt;*&lt;/SPAN&gt;&lt;SPAN&gt;*&lt;/SPAN&gt;og_kwargs) &lt;SPAN&gt;427&lt;/SPAN&gt; event_logger&lt;SPAN&gt;.&lt;/SPAN&gt;log_original_function_success(og_args, og_kwargs) &lt;SPAN&gt;428&lt;/SPAN&gt; &lt;SPAN class=""&gt;return&lt;/SPAN&gt; original_fn_result&lt;/DIV&gt;&lt;DIV class=""&gt;File &lt;SPAN class=""&gt;/databricks/python/lib/python3.12/site-packages/mlflow/utils/autologging_utils/safety.py:471&lt;/SPAN&gt;, in &lt;SPAN class=""&gt;safe_patch.&amp;lt;locals&amp;gt;.safe_patch_function.&amp;lt;locals&amp;gt;.call_original.&amp;lt;locals&amp;gt;._original_fn&lt;/SPAN&gt;&lt;SPAN class=""&gt;(*_og_args, **_og_kwargs)&lt;/SPAN&gt; &lt;SPAN&gt;463&lt;/SPAN&gt; &lt;SPAN&gt;# Show all non-MLflow warnings as normal (i.e. not as event logs)&lt;/SPAN&gt; &lt;SPAN&gt;464&lt;/SPAN&gt; &lt;SPAN&gt;# during original function execution, even if silent mode is enabled&lt;/SPAN&gt; &lt;SPAN&gt;465&lt;/SPAN&gt; &lt;SPAN&gt;# (`silent=True`), since these warnings originate from the ML framework&lt;/SPAN&gt; &lt;SPAN&gt;466&lt;/SPAN&gt; &lt;SPAN&gt;# or one of its dependencies and are likely relevant to the caller&lt;/SPAN&gt; &lt;SPAN&gt;467&lt;/SPAN&gt; &lt;SPAN class=""&gt;with&lt;/SPAN&gt; NonMlflowWarningsBehaviorForCurrentThread( &lt;SPAN&gt;468&lt;/SPAN&gt; disable_warnings&lt;SPAN&gt;=&lt;/SPAN&gt;&lt;SPAN class=""&gt;False&lt;/SPAN&gt;, &lt;SPAN&gt;469&lt;/SPAN&gt; reroute_warnings&lt;SPAN&gt;=&lt;/SPAN&gt;&lt;SPAN class=""&gt;False&lt;/SPAN&gt;, &lt;SPAN&gt;470&lt;/SPAN&gt; &lt;span class="lia-unicode-emoji" title=":disappointed_face:"&gt;😞&lt;/span&gt; &lt;SPAN class=""&gt;--&amp;gt; 471&lt;/SPAN&gt; original_result &lt;SPAN&gt;=&lt;/SPAN&gt; original(&lt;SPAN&gt;*&lt;/SPAN&gt;_og_args, &lt;SPAN&gt;*&lt;/SPAN&gt;&lt;SPAN&gt;*&lt;/SPAN&gt;_og_kwargs) &lt;SPAN&gt;472&lt;/SPAN&gt; &lt;SPAN class=""&gt;return&lt;/SPAN&gt; original_result&lt;/DIV&gt;&lt;DIV class=""&gt;File &lt;SPAN class=""&gt;/databricks/spark/python/pyspark/ml/base.py:203&lt;/SPAN&gt;, in &lt;SPAN class=""&gt;Estimator.fit&lt;/SPAN&gt;&lt;SPAN class=""&gt;(self, dataset, params)&lt;/SPAN&gt; &lt;SPAN&gt;201&lt;/SPAN&gt; &lt;SPAN class=""&gt;return&lt;/SPAN&gt; &lt;SPAN&gt;self&lt;/SPAN&gt;&lt;SPAN&gt;.&lt;/SPAN&gt;copy(params)&lt;SPAN&gt;.&lt;/SPAN&gt;_fit(dataset) &lt;SPAN&gt;202&lt;/SPAN&gt; &lt;SPAN class=""&gt;else&lt;/SPAN&gt;: &lt;SPAN class=""&gt;--&amp;gt; 203&lt;/SPAN&gt; &lt;SPAN class=""&gt;return&lt;/SPAN&gt; &lt;SPAN&gt;self&lt;/SPAN&gt;&lt;SPAN&gt;.&lt;/SPAN&gt;_fit(dataset) &lt;SPAN&gt;204&lt;/SPAN&gt; &lt;SPAN class=""&gt;else&lt;/SPAN&gt;: &lt;SPAN&gt;205&lt;/SPAN&gt; &lt;SPAN class=""&gt;raise&lt;/SPAN&gt; &lt;SPAN class=""&gt;TypeError&lt;/SPAN&gt;( &lt;SPAN&gt;206&lt;/SPAN&gt; &lt;SPAN&gt;"&lt;/SPAN&gt;&lt;SPAN&gt;Params must be either a param map or a list/tuple of param maps, &lt;/SPAN&gt;&lt;SPAN&gt;"&lt;/SPAN&gt; &lt;SPAN&gt;207&lt;/SPAN&gt; &lt;SPAN&gt;"&lt;/SPAN&gt;&lt;SPAN&gt;but got &lt;/SPAN&gt;&lt;SPAN class=""&gt;%s&lt;/SPAN&gt;&lt;SPAN&gt;.&lt;/SPAN&gt;&lt;SPAN&gt;"&lt;/SPAN&gt; &lt;SPAN&gt;%&lt;/SPAN&gt; &lt;SPAN&gt;type&lt;/SPAN&gt;(params) &lt;SPAN&gt;208&lt;/SPAN&gt; )&lt;/DIV&gt;&lt;DIV class=""&gt;File &lt;SPAN class=""&gt;/databricks/python/lib/python3.12/site-packages/xgboost/spark/core.py:1136&lt;/SPAN&gt;, in &lt;SPAN class=""&gt;_SparkXGBEstimator._fit&lt;/SPAN&gt;&lt;SPAN class=""&gt;(self, dataset)&lt;/SPAN&gt; &lt;SPAN&gt;1123&lt;/SPAN&gt; &lt;SPAN class=""&gt;return&lt;/SPAN&gt; ret[&lt;SPAN&gt;0&lt;/SPAN&gt;], ret[&lt;SPAN&gt;1&lt;/SPAN&gt;] &lt;SPAN&gt;1125&lt;/SPAN&gt; get_logger(&lt;SPAN&gt;"&lt;/SPAN&gt;&lt;SPAN&gt;XGBoost-PySpark&lt;/SPAN&gt;&lt;SPAN&gt;"&lt;/SPAN&gt;)&lt;SPAN&gt;.&lt;/SPAN&gt;info( &lt;SPAN&gt;1126&lt;/SPAN&gt; &lt;SPAN&gt;"&lt;/SPAN&gt;&lt;SPAN&gt;Running xgboost-&lt;/SPAN&gt;&lt;SPAN class=""&gt;%s&lt;/SPAN&gt;&lt;SPAN&gt; on &lt;/SPAN&gt;&lt;SPAN class=""&gt;%s&lt;/SPAN&gt;&lt;SPAN&gt; workers with&lt;/SPAN&gt;&lt;SPAN&gt;"&lt;/SPAN&gt; &lt;SPAN&gt;1127&lt;/SPAN&gt; &lt;SPAN&gt;"&lt;/SPAN&gt;&lt;SPAN class=""&gt;\n&lt;/SPAN&gt;&lt;SPAN class=""&gt;\t&lt;/SPAN&gt;&lt;SPAN&gt;booster params: &lt;/SPAN&gt;&lt;SPAN class=""&gt;%s&lt;/SPAN&gt;&lt;SPAN&gt;"&lt;/SPAN&gt; &lt;SPAN class=""&gt;(...)&lt;/SPAN&gt; &lt;SPAN&gt;1134&lt;/SPAN&gt; dmatrix_kwargs, &lt;SPAN&gt;1135&lt;/SPAN&gt; ) &lt;SPAN class=""&gt;-&amp;gt; 1136&lt;/SPAN&gt; (config, booster) &lt;SPAN&gt;=&lt;/SPAN&gt; _run_job() &lt;SPAN&gt;1137&lt;/SPAN&gt; get_logger(&lt;SPAN&gt;"&lt;/SPAN&gt;&lt;SPAN&gt;XGBoost-PySpark&lt;/SPAN&gt;&lt;SPAN&gt;"&lt;/SPAN&gt;)&lt;SPAN&gt;.&lt;/SPAN&gt;info(&lt;SPAN&gt;"&lt;/SPAN&gt;&lt;SPAN&gt;Finished xgboost training!&lt;/SPAN&gt;&lt;SPAN&gt;"&lt;/SPAN&gt;) &lt;SPAN&gt;1139&lt;/SPAN&gt; result_xgb_model &lt;SPAN&gt;=&lt;/SPAN&gt; &lt;SPAN&gt;self&lt;/SPAN&gt;&lt;SPAN&gt;.&lt;/SPAN&gt;_convert_to_sklearn_model( &lt;SPAN&gt;1140&lt;/SPAN&gt; &lt;SPAN&gt;bytearray&lt;/SPAN&gt;(booster, &lt;SPAN&gt;"&lt;/SPAN&gt;&lt;SPAN&gt;utf-8&lt;/SPAN&gt;&lt;SPAN&gt;"&lt;/SPAN&gt;), config &lt;SPAN&gt;1141&lt;/SPAN&gt; )&lt;/DIV&gt;&lt;DIV class=""&gt;File &lt;SPAN class=""&gt;/databricks/python/lib/python3.12/site-packages/xgboost/spark/core.py:1122&lt;/SPAN&gt;, in &lt;SPAN class=""&gt;_SparkXGBEstimator._fit.&amp;lt;locals&amp;gt;._run_job&lt;/SPAN&gt;&lt;SPAN class=""&gt;()&lt;/SPAN&gt; &lt;SPAN&gt;1113&lt;/SPAN&gt; rdd &lt;SPAN&gt;=&lt;/SPAN&gt; ( &lt;SPAN&gt;1114&lt;/SPAN&gt; dataset&lt;SPAN&gt;.&lt;/SPAN&gt;mapInPandas( &lt;SPAN&gt;1115&lt;/SPAN&gt; _train_booster, &lt;SPAN&gt;# type: ignore&lt;/SPAN&gt; &lt;SPAN class=""&gt;(...)&lt;/SPAN&gt; &lt;SPAN&gt;1119&lt;/SPAN&gt; &lt;SPAN&gt;.&lt;/SPAN&gt;mapPartitions(&lt;SPAN class=""&gt;lambda&lt;/SPAN&gt; x: x) &lt;SPAN&gt;1120&lt;/SPAN&gt; ) &lt;SPAN&gt;1121&lt;/SPAN&gt; rdd_with_resource &lt;SPAN&gt;=&lt;/SPAN&gt; &lt;SPAN&gt;self&lt;/SPAN&gt;&lt;SPAN&gt;.&lt;/SPAN&gt;_try_stage_level_scheduling(rdd) &lt;SPAN class=""&gt;-&amp;gt; 1122&lt;/SPAN&gt; ret &lt;SPAN&gt;=&lt;/SPAN&gt; rdd_with_resource&lt;SPAN&gt;.&lt;/SPAN&gt;collect()[&lt;SPAN&gt;0&lt;/SPAN&gt;] &lt;SPAN&gt;1123&lt;/SPAN&gt; &lt;SPAN class=""&gt;return&lt;/SPAN&gt; ret[&lt;SPAN&gt;0&lt;/SPAN&gt;], ret[&lt;SPAN&gt;1&lt;/SPAN&gt;]&lt;/DIV&gt;&lt;DIV class=""&gt;File &lt;SPAN class=""&gt;/databricks/spark/python/pyspark/instrumentation_utils.py:47&lt;/SPAN&gt;, in &lt;SPAN class=""&gt;_wrap_function.&amp;lt;locals&amp;gt;.wrapper&lt;/SPAN&gt;&lt;SPAN class=""&gt;(*args, **kwargs)&lt;/SPAN&gt; &lt;SPAN&gt;45&lt;/SPAN&gt; start &lt;SPAN&gt;=&lt;/SPAN&gt; time&lt;SPAN&gt;.&lt;/SPAN&gt;perf_counter() &lt;SPAN&gt;46&lt;/SPAN&gt; &lt;SPAN class=""&gt;try&lt;/SPAN&gt;: &lt;SPAN class=""&gt;---&amp;gt; 47&lt;/SPAN&gt; res &lt;SPAN&gt;=&lt;/SPAN&gt; func(&lt;SPAN&gt;*&lt;/SPAN&gt;args, &lt;SPAN&gt;*&lt;/SPAN&gt;&lt;SPAN&gt;*&lt;/SPAN&gt;kwargs) &lt;SPAN&gt;48&lt;/SPAN&gt; logger&lt;SPAN&gt;.&lt;/SPAN&gt;log_success( &lt;SPAN&gt;49&lt;/SPAN&gt; module_name, class_name, function_name, time&lt;SPAN&gt;.&lt;/SPAN&gt;perf_counter() &lt;SPAN&gt;-&lt;/SPAN&gt; start, signature &lt;SPAN&gt;50&lt;/SPAN&gt; ) &lt;SPAN&gt;51&lt;/SPAN&gt; &lt;SPAN class=""&gt;return&lt;/SPAN&gt; res&lt;/DIV&gt;&lt;DIV class=""&gt;File &lt;SPAN class=""&gt;/databricks/spark/python/pyspark/core/rdd.py:1721&lt;/SPAN&gt;, in &lt;SPAN class=""&gt;RDD.collect&lt;/SPAN&gt;&lt;SPAN class=""&gt;(self)&lt;/SPAN&gt; &lt;SPAN&gt;1719&lt;/SPAN&gt; &lt;SPAN class=""&gt;with&lt;/SPAN&gt; SCCallSiteSync(&lt;SPAN&gt;self&lt;/SPAN&gt;&lt;SPAN&gt;.&lt;/SPAN&gt;context): &lt;SPAN&gt;1720&lt;/SPAN&gt; &lt;SPAN class=""&gt;assert&lt;/SPAN&gt; &lt;SPAN&gt;self&lt;/SPAN&gt;&lt;SPAN&gt;.&lt;/SPAN&gt;ctx&lt;SPAN&gt;.&lt;/SPAN&gt;_jvm &lt;SPAN class=""&gt;is&lt;/SPAN&gt; &lt;SPAN class=""&gt;not&lt;/SPAN&gt; &lt;SPAN class=""&gt;None&lt;/SPAN&gt; &lt;SPAN class=""&gt;-&amp;gt; 1721&lt;/SPAN&gt; sock_info &lt;SPAN&gt;=&lt;/SPAN&gt; &lt;SPAN&gt;self&lt;/SPAN&gt;&lt;SPAN&gt;.&lt;/SPAN&gt;ctx&lt;SPAN&gt;.&lt;/SPAN&gt;_jvm&lt;SPAN&gt;.&lt;/SPAN&gt;PythonRDD&lt;SPAN&gt;.&lt;/SPAN&gt;collectAndServe(&lt;SPAN&gt;self&lt;/SPAN&gt;&lt;SPAN&gt;.&lt;/SPAN&gt;_jrdd&lt;SPAN&gt;.&lt;/SPAN&gt;rdd()) &lt;SPAN&gt;1722&lt;/SPAN&gt; &lt;SPAN class=""&gt;return&lt;/SPAN&gt; &lt;SPAN&gt;list&lt;/SPAN&gt;(_load_from_socket(sock_info, &lt;SPAN&gt;self&lt;/SPAN&gt;&lt;SPAN&gt;.&lt;/SPAN&gt;_jrdd_deserializer))&lt;/DIV&gt;&lt;DIV class=""&gt;File &lt;SPAN class=""&gt;/databricks/spark/python/lib/py4j-0.10.9.9-src.zip/py4j/java_gateway.py:1362&lt;/SPAN&gt;, in &lt;SPAN class=""&gt;JavaMember.__call__&lt;/SPAN&gt;&lt;SPAN class=""&gt;(self, *args)&lt;/SPAN&gt; &lt;SPAN&gt;1356&lt;/SPAN&gt; command &lt;SPAN&gt;=&lt;/SPAN&gt; proto&lt;SPAN&gt;.&lt;/SPAN&gt;CALL_COMMAND_NAME &lt;SPAN&gt;+&lt;/SPAN&gt;\ &lt;SPAN&gt;1357&lt;/SPAN&gt; &lt;SPAN&gt;self&lt;/SPAN&gt;&lt;SPAN&gt;.&lt;/SPAN&gt;command_header &lt;SPAN&gt;+&lt;/SPAN&gt;\ &lt;SPAN&gt;1358&lt;/SPAN&gt; args_command &lt;SPAN&gt;+&lt;/SPAN&gt;\ &lt;SPAN&gt;1359&lt;/SPAN&gt; proto&lt;SPAN&gt;.&lt;/SPAN&gt;END_COMMAND_PART &lt;SPAN&gt;1361&lt;/SPAN&gt; answer &lt;SPAN&gt;=&lt;/SPAN&gt; &lt;SPAN&gt;self&lt;/SPAN&gt;&lt;SPAN&gt;.&lt;/SPAN&gt;gateway_client&lt;SPAN&gt;.&lt;/SPAN&gt;send_command(command) &lt;SPAN class=""&gt;-&amp;gt; 1362&lt;/SPAN&gt; return_value &lt;SPAN&gt;=&lt;/SPAN&gt; get_return_value( &lt;SPAN&gt;1363&lt;/SPAN&gt; answer, &lt;SPAN&gt;self&lt;/SPAN&gt;&lt;SPAN&gt;.&lt;/SPAN&gt;gateway_client, &lt;SPAN&gt;self&lt;/SPAN&gt;&lt;SPAN&gt;.&lt;/SPAN&gt;target_id, &lt;SPAN&gt;self&lt;/SPAN&gt;&lt;SPAN&gt;.&lt;/SPAN&gt;name) &lt;SPAN&gt;1365&lt;/SPAN&gt; &lt;SPAN class=""&gt;for&lt;/SPAN&gt; temp_arg &lt;SPAN class=""&gt;in&lt;/SPAN&gt; temp_args: &lt;SPAN&gt;1366&lt;/SPAN&gt; &lt;SPAN class=""&gt;if&lt;/SPAN&gt; &lt;SPAN&gt;hasattr&lt;/SPAN&gt;(temp_arg, &lt;SPAN&gt;"&lt;/SPAN&gt;&lt;SPAN&gt;_detach&lt;/SPAN&gt;&lt;SPAN&gt;"&lt;/SPAN&gt;&lt;span class="lia-unicode-emoji" title=":disappointed_face:"&gt;😞&lt;/span&gt;&lt;/DIV&gt;&lt;DIV class=""&gt;File &lt;SPAN class=""&gt;/databricks/spark/python/pyspark/errors/exceptions/captured.py:269&lt;/SPAN&gt;, in &lt;SPAN class=""&gt;capture_sql_exception.&amp;lt;locals&amp;gt;.deco&lt;/SPAN&gt;&lt;SPAN class=""&gt;(*a, **kw)&lt;/SPAN&gt; &lt;SPAN&gt;266&lt;/SPAN&gt; &lt;SPAN class=""&gt;from&lt;/SPAN&gt; &lt;SPAN class=""&gt;py4j&lt;/SPAN&gt;&lt;SPAN class=""&gt;.&lt;/SPAN&gt;&lt;SPAN class=""&gt;protocol&lt;/SPAN&gt; &lt;SPAN class=""&gt;import&lt;/SPAN&gt; Py4JJavaError &lt;SPAN&gt;268&lt;/SPAN&gt; &lt;SPAN class=""&gt;try&lt;/SPAN&gt;: &lt;SPAN class=""&gt;--&amp;gt; 269&lt;/SPAN&gt; &lt;SPAN class=""&gt;return&lt;/SPAN&gt; f(&lt;SPAN&gt;*&lt;/SPAN&gt;a, &lt;SPAN&gt;*&lt;/SPAN&gt;&lt;SPAN&gt;*&lt;/SPAN&gt;kw) &lt;SPAN&gt;270&lt;/SPAN&gt; &lt;SPAN class=""&gt;except&lt;/SPAN&gt; Py4JJavaError &lt;SPAN class=""&gt;as&lt;/SPAN&gt; e: &lt;SPAN&gt;271&lt;/SPAN&gt; converted &lt;SPAN&gt;=&lt;/SPAN&gt; convert_exception(e&lt;SPAN&gt;.&lt;/SPAN&gt;java_exception)&lt;/DIV&gt;&lt;DIV class=""&gt;File &lt;SPAN class=""&gt;/databricks/spark/python/lib/py4j-0.10.9.9-src.zip/py4j/protocol.py:327&lt;/SPAN&gt;, in &lt;SPAN class=""&gt;get_return_value&lt;/SPAN&gt;&lt;SPAN class=""&gt;(answer, gateway_client, target_id, name)&lt;/SPAN&gt; &lt;SPAN&gt;325&lt;/SPAN&gt; value &lt;SPAN&gt;=&lt;/SPAN&gt; OUTPUT_CONVERTER[&lt;SPAN&gt;type&lt;/SPAN&gt;](answer[&lt;SPAN&gt;2&lt;/SPAN&gt;:], gateway_client) &lt;SPAN&gt;326&lt;/SPAN&gt; &lt;SPAN class=""&gt;if&lt;/SPAN&gt; answer[&lt;SPAN&gt;1&lt;/SPAN&gt;] &lt;SPAN&gt;==&lt;/SPAN&gt; REFERENCE_TYPE: &lt;SPAN class=""&gt;--&amp;gt; 327&lt;/SPAN&gt; &lt;SPAN class=""&gt;raise&lt;/SPAN&gt; Py4JJavaError( &lt;SPAN&gt;328&lt;/SPAN&gt; &lt;SPAN&gt;"&lt;/SPAN&gt;&lt;SPAN&gt;An error occurred while calling &lt;/SPAN&gt;&lt;SPAN class=""&gt;{0}&lt;/SPAN&gt;&lt;SPAN class=""&gt;{1}&lt;/SPAN&gt;&lt;SPAN class=""&gt;{2}&lt;/SPAN&gt;&lt;SPAN&gt;.&lt;/SPAN&gt;&lt;SPAN class=""&gt;\n&lt;/SPAN&gt;&lt;SPAN&gt;"&lt;/SPAN&gt;&lt;SPAN&gt;.&lt;/SPAN&gt; &lt;SPAN&gt;329&lt;/SPAN&gt; &lt;SPAN&gt;format&lt;/SPAN&gt;(target_id, &lt;SPAN&gt;"&lt;/SPAN&gt;&lt;SPAN&gt;.&lt;/SPAN&gt;&lt;SPAN&gt;"&lt;/SPAN&gt;, name), value) &lt;SPAN&gt;330&lt;/SPAN&gt; &lt;SPAN class=""&gt;else&lt;/SPAN&gt;: &lt;SPAN&gt;331&lt;/SPAN&gt; &lt;SPAN class=""&gt;raise&lt;/SPAN&gt; Py4JError( &lt;SPAN&gt;332&lt;/SPAN&gt; &lt;SPAN&gt;"&lt;/SPAN&gt;&lt;SPAN&gt;An error occurred while calling &lt;/SPAN&gt;&lt;SPAN class=""&gt;{0}&lt;/SPAN&gt;&lt;SPAN class=""&gt;{1}&lt;/SPAN&gt;&lt;SPAN class=""&gt;{2}&lt;/SPAN&gt;&lt;SPAN&gt;. Trace:&lt;/SPAN&gt;&lt;SPAN class=""&gt;\n&lt;/SPAN&gt;&lt;SPAN class=""&gt;{3}&lt;/SPAN&gt;&lt;SPAN class=""&gt;\n&lt;/SPAN&gt;&lt;SPAN&gt;"&lt;/SPAN&gt;&lt;SPAN&gt;.&lt;/SPAN&gt; &lt;SPAN&gt;333&lt;/SPAN&gt; &lt;SPAN&gt;format&lt;/SPAN&gt;(target_id, &lt;SPAN&gt;"&lt;/SPAN&gt;&lt;SPAN&gt;.&lt;/SPAN&gt;&lt;SPAN&gt;"&lt;/SPAN&gt;, name, value))&lt;/DIV&gt;&lt;DIV class=""&gt;```&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;DIV class=""&gt;&lt;DIV class=""&gt;&lt;DIV class=""&gt;&amp;nbsp;&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;</description>
    <pubDate>Thu, 29 May 2025 17:26:57 GMT</pubDate>
    <dc:creator>spicysheep</dc:creator>
    <dc:date>2025-05-29T17:26:57Z</dc:date>
    <item>
      <title>Distributed SparkXGBRanker training: failed barrier ResultStage</title>
      <link>https://community.databricks.com/t5/machine-learning/distributed-sparkxgbranker-training-failed-barrier-resultstage/m-p/120487#M4092</link>
      <description>&lt;P&gt;I'm following a variation of the tutorial [here](&lt;A href="https://assets.docs.databricks.com/_extras/notebooks/source/xgboost-pyspark-new.html" target="_blank" rel="noopener"&gt;https://assets.docs.databricks.com/_extras/notebooks/source/xgboost-pyspark-new.html&lt;/A&gt;) to train an `&lt;SPAN&gt;SparkXGBRanker` in distributed mode. However, the line:&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;pipeline_model &lt;/SPAN&gt;&lt;SPAN&gt;=&lt;/SPAN&gt;&lt;SPAN&gt; pipeline.&lt;/SPAN&gt;&lt;SPAN&gt;fit&lt;/SPAN&gt;&lt;SPAN&gt;(data)&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;Is throwing an error:&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&amp;nbsp;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;&amp;gt; org.apache.spark.SparkException: Job aborted due to stage failure: Could not recover from a failed barrier ResultStage. Most recent failure reason: Stage failed because barrier task ResultTask(12, 55) finished unsuccessfully.&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&amp;nbsp;&lt;/DIV&gt;&lt;DIV&gt;I've read that that I need to set autoscaling on the cluster to false, which I've done. I'm using `&lt;DIV class=""&gt;&lt;SPAN class=""&gt;13.3 LTS ML (includes Apache Spark 3.4.1, Scala 2.12)`.&lt;/SPAN&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;DIV&gt;&amp;nbsp;&lt;/DIV&gt;&lt;/DIV&gt;</description>
      <pubDate>Thu, 29 May 2025 05:46:48 GMT</pubDate>
      <guid>https://community.databricks.com/t5/machine-learning/distributed-sparkxgbranker-training-failed-barrier-resultstage/m-p/120487#M4092</guid>
      <dc:creator>spicysheep</dc:creator>
      <dc:date>2025-05-29T05:46:48Z</dc:date>
    </item>
    <item>
      <title>Re: Distributed SparkXGBRanker training: failed barrier ResultStage</title>
      <link>https://community.databricks.com/t5/machine-learning/distributed-sparkxgbranker-training-failed-barrier-resultstage/m-p/120569#M4094</link>
      <description>&lt;P&gt;I've also tried upgrading to&amp;nbsp;&lt;SPAN class=""&gt;16.4 LTS ML (includes Apache Spark 3.5.2, Scala 2.12).&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;SPAN class=""&gt;Full error below:&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;SPAN class=""&gt;```&lt;/SPAN&gt;&lt;/P&gt;&lt;DIV&gt;&lt;DIV class=""&gt;&lt;DIV&gt;&lt;DIV class=""&gt;&lt;DIV class=""&gt;&lt;DIV class=""&gt;&lt;DIV class=""&gt;2025-05-29 17:13:53,553 INFO XGBoost-PySpark: _fit Running xgboost-2.0.3 on 76 workers with booster params: {'objective': 'rank:ndcg', 'colsample_bytree': 0.8, 'device': 'cpu', 'gamma': 4, 'grow_policy': 'lossguide', 'max_depth': 8, 'max_leaves': 128, 'min_child_weight': 6, 'alpha': 1, 'eta': 0.5, 'lambda': 3, 'num_round': 600, 'eval_metric': 'ndcg@5', 'nthread': 1} train_call_kwargs_params: {'early_stopping_rounds': 30, 'verbose_eval': False, 'num_boost_round': 100} dmatrix_kwargs: {'nthread': 1, 'missing': nan}&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;DIV class=""&gt;&lt;DIV class=""&gt;&lt;DIV&gt;&lt;DIV class=""&gt;&lt;DIV&gt;&lt;DIV class=""&gt;&lt;SPAN class=""&gt;Py4JJavaError: &lt;/SPAN&gt;An error occurred while calling z:org.apache.spark.api.python.PythonRDD.collectAndServe. : org.apache.spark.SparkException: Job aborted due to stage failure: Could not recover from a failed barrier ResultStage. Most recent failure reason: Stage failed because barrier task ResultTask(14, 3) finished unsuccessfully. org.apache.spark.api.python.PythonException: Traceback (most recent call last): File "/databricks/python/lib/python3.12/site-packages/xgboost/spark/core.py", line 1082, in _train_booster dtrain, dvalid = create_dmatrix_from_partitions( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/databricks/python/lib/python3.12/site-packages/xgboost/spark/data.py", line 312, in create_dmatrix_from_partitions cache_partitions(iterator, append_fn) File "/databricks/python/lib/python3.12/site-packages/xgboost/spark/data.py", line 59, in cache_partitions train = part.loc[~part[alias.valid], :] ~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^ File "/databricks/python/lib/python3.12/site-packages/pandas/core/indexing.py", line 1067, in __getitem__ return self._getitem_tuple(key) ^^^^^^^^^^^^^^^^^^^^^^^^ File "/databricks/python/lib/python3.12/site-packages/pandas/core/indexing.py", line 1256, in _getitem_tuple return self._getitem_tuple_same_dim(tup) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/databricks/python/lib/python3.12/site-packages/pandas/core/indexing.py", line 924, in _getitem_tuple_same_dim retval = getattr(retval, self.name)._getitem_axis(key, axis=i) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/databricks/python/lib/python3.12/site-packages/pandas/core/indexing.py", line 1301, in _getitem_axis return self._getitem_iterable(key, axis=axis) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/databricks/python/lib/python3.12/site-packages/pandas/core/indexing.py", line 1239, in _getitem_iterable keyarr, indexer = self._get_listlike_indexer(key, axis) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/databricks/python/lib/python3.12/site-packages/pandas/core/indexing.py", line 1432, in _get_listlike_indexer keyarr, indexer = ax._get_indexer_strict(key, axis_name) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/databricks/python/lib/python3.12/site-packages/pandas/core/indexes/base.py", line 6070, in _get_indexer_strict self._raise_if_missing(keyarr, indexer, axis_name) File "/databricks/python/lib/python3.12/site-packages/pandas/core/indexes/base.py", line 6130, in _raise_if_missing raise KeyError(f"None of [{key}] are in the [{axis_name}]") KeyError: "None of [Int64Index([-1, -1, -1, -1, -1, -1, -1, -1, -1, -1,\n ...\n -1, -1, -1, -1, -1, -1, -1, -1, -1, -1],\n dtype='int64', length=10000)] are in the [index]" at org.apache.spark.api.python.BasePythonRunner$ReaderIterator.handlePythonException(PythonRunner.scala:851) at org.apache.spark.sql.execution.python.PythonArrowOutput$$anon$1.read(PythonArrowOutput.scala:117) at org.apache.spark.api.python.BasePythonRunner$ReaderIterator.hasNext(PythonRunner.scala:800) at org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:37) at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:491) at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:460) at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:460) at org.apache.spark.api.python.SerDeUtil$AutoBatchedPickler.hasNext(SerDeUtil.scala:90) at org.apache.spark.api.python.PythonRDD$.writeNextElementToStream(PythonRDD.scala:504) at org.apache.spark.api.python.PythonRunner$$anon$2.writeNextInputToStream(PythonRunner.scala:1283) at org.apache.spark.api.python.BasePythonRunner$ReaderInputStream.writeAdditionalInputToPythonWorker(PythonRunner.scala:1191) at org.apache.spark.api.python.BasePythonRunner$ReaderInputStream.read(PythonRunner.scala:1105) at java.base/java.io.BufferedInputStream.fill(BufferedInputStream.java:244) at java.base/java.io.BufferedInputStream.read(BufferedInputStream.java:263) at java.base/java.io.DataInputStream.readInt(DataInputStream.java:381) at org.apache.spark.api.python.PythonRunner$$anon$3.read(PythonRunner.scala:1310) at org.apache.spark.api.python.PythonRunner$$anon$3.read(PythonRunner.scala:1302) at org.apache.spark.api.python.BasePythonRunner$ReaderIterator.hasNext(PythonRunner.scala:800) at org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:37) at scala.collection.Iterator.foreach(Iterator.scala:943) at scala.collection.Iterator.foreach$(Iterator.scala:943) at org.apache.spark.InterruptibleIterator.foreach(InterruptibleIterator.scala:28) at scala.collection.generic.Growable.$plus$plus$eq(Growable.scala:62) at scala.collection.generic.Growable.$plus$plus$eq$(Growable.scala:53) at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:105) at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:49) at scala.collection.TraversableOnce.to(TraversableOnce.scala:366) at scala.collection.TraversableOnce.to$(TraversableOnce.scala:364) at org.apache.spark.InterruptibleIterator.to(InterruptibleIterator.scala:28) at scala.collection.TraversableOnce.toBuffer(TraversableOnce.scala:358) at scala.collection.TraversableOnce.toBuffer$(TraversableOnce.scala:358) at org.apache.spark.InterruptibleIterator.toBuffer(InterruptibleIterator.scala:28) at scala.collection.TraversableOnce.toArray(TraversableOnce.scala:345) at scala.collection.TraversableOnce.toArray$(TraversableOnce.scala:339) at org.apache.spark.InterruptibleIterator.toArray(InterruptibleIterator.scala:28) at org.apache.spark.rdd.RDD.$anonfun$collect$2(RDD.scala:1113) at org.apache.spark.scheduler.ResultTask.$anonfun$runTask$2(ResultTask.scala:76) at com.databricks.spark.util.ExecutorFrameProfiler$.record(ExecutorFrameProfiler.scala:110) at org.apache.spark.scheduler.ResultTask.$anonfun$runTask$1(ResultTask.scala:76) at com.databricks.spark.util.ExecutorFrameProfiler$.record(ExecutorFrameProfiler.scala:110) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:62) at org.apache.spark.TaskContext.runTaskWithListeners(TaskContext.scala:227) at org.apache.spark.scheduler.Task.doRunTask(Task.scala:204) at org.apache.spark.scheduler.Task.$anonfun$run$5(Task.scala:166) at com.databricks.unity.UCSEphemeralState$Handle.runWith(UCSEphemeralState.scala:51) at com.databricks.unity.HandleImpl.runWith(UCSHandle.scala:104) at com.databricks.unity.HandleImpl.$anonfun$runWithAndClose$1(UCSHandle.scala:109) at scala.util.Using$.resource(Using.scala:269) at com.databricks.unity.HandleImpl.runWithAndClose(UCSHandle.scala:108) at org.apache.spark.scheduler.Task.$anonfun$run$1(Task.scala:160) at com.databricks.spark.util.ExecutorFrameProfiler$.record(ExecutorFrameProfiler.scala:110) at org.apache.spark.scheduler.Task.run(Task.scala:105) at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$11(Executor.scala:1228) at org.apache.spark.util.SparkErrorUtils.tryWithSafeFinally(SparkErrorUtils.scala:80) at org.apache.spark.util.SparkErrorUtils.tryWithSafeFinally$(SparkErrorUtils.scala:77) at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:111) at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:1232) at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23) at com.databricks.spark.util.ExecutorFrameProfiler$.record(ExecutorFrameProfiler.scala:110) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:1088) at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136) at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) at java.base/java.lang.Thread.run(Thread.java:840) at org.apache.spark.scheduler.DAGScheduler.$anonfun$failJobAndIndependentStages$1(DAGScheduler.scala:4472) at scala.Option.getOrElse(Option.scala:189) at org.apache.spark.scheduler.DAGScheduler.failJobAndIndependentStages(DAGScheduler.scala:4470) at org.apache.spark.scheduler.DAGScheduler.$anonfun$abortStage$2(DAGScheduler.scala:4382) at org.apache.spark.scheduler.DAGScheduler.$anonfun$abortStage$2$adapted(DAGScheduler.scala:4369) at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62) at scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55) at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49) at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:4369) at org.apache.spark.scheduler.DAGScheduler.handleTaskCompletion(DAGScheduler.scala:3737) at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:4730) at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.liftedTree1$1(DAGScheduler.scala:4634) at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:4633) at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:4619) at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:55) at org.apache.spark.scheduler.DAGScheduler.$anonfun$runJob$1(DAGScheduler.scala:1512) at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23) at com.databricks.spark.util.FrameProfiler$.record(FrameProfiler.scala:94) at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:1498) at org.apache.spark.SparkContext.runJobInternal(SparkContext.scala:3254) at org.apache.spark.rdd.RDD.$anonfun$collect$1(RDD.scala:1111) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:165) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:125) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112) at org.apache.spark.rdd.RDD.withScope(RDD.scala:461) at org.apache.spark.rdd.RDD.collect(RDD.scala:1109) at org.apache.spark.api.python.PythonRDD$.collectAndServe(PythonRDD.scala:319) at org.apache.spark.api.python.PythonRDD.collectAndServe(PythonRDD.scala) at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77) at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.base/java.lang.reflect.Method.invoke(Method.java:569) at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244) at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:397) at py4j.Gateway.invoke(Gateway.java:306) at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132) at py4j.commands.CallCommand.execute(CallCommand.java:79) at py4j.ClientServerConnection.waitForCommands(ClientServerConnection.java:197) at py4j.ClientServerConnection.run(ClientServerConnection.java:117) at java.base/java.lang.Thread.run(Thread.java:840)&lt;/DIV&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;DIV class=""&gt;File &lt;SPAN class=""&gt;&lt;A target="_blank"&gt;&amp;lt;command-5246778553375285&amp;gt;&lt;/A&gt;, line 1&lt;/SPAN&gt; &lt;SPAN class=""&gt;----&amp;gt; 1&lt;/SPAN&gt; pipeline_model &lt;SPAN&gt;=&lt;/SPAN&gt; pipeline&lt;SPAN&gt;.&lt;/SPAN&gt;fit(exploded) &lt;SPAN&gt;2&lt;/SPAN&gt; ranker_model &lt;SPAN&gt;=&lt;/SPAN&gt; pipeline_model&lt;SPAN&gt;.&lt;/SPAN&gt;stages[&lt;SPAN&gt;-&lt;/SPAN&gt;&lt;SPAN&gt;1&lt;/SPAN&gt;] &lt;SPAN&gt;4&lt;/SPAN&gt; native_booster &lt;SPAN&gt;=&lt;/SPAN&gt; ranker_model&lt;SPAN&gt;.&lt;/SPAN&gt;get_booster()&lt;/DIV&gt;&lt;DIV class=""&gt;&lt;HR /&gt;&lt;/DIV&gt;&lt;DIV class=""&gt;File &lt;SPAN class=""&gt;/databricks/python_shell/lib/dbruntime/MLWorkloadsInstrumentation/_pyspark.py:30&lt;/SPAN&gt;, in &lt;SPAN class=""&gt;_create_patch_function.&amp;lt;locals&amp;gt;.patched_method&lt;/SPAN&gt;&lt;SPAN class=""&gt;(self, *args, **kwargs)&lt;/SPAN&gt; &lt;SPAN&gt;28&lt;/SPAN&gt; call_succeeded &lt;SPAN&gt;=&lt;/SPAN&gt; &lt;SPAN class=""&gt;False&lt;/SPAN&gt; &lt;SPAN&gt;29&lt;/SPAN&gt; &lt;SPAN class=""&gt;try&lt;/SPAN&gt;: &lt;SPAN class=""&gt;---&amp;gt; 30&lt;/SPAN&gt; result &lt;SPAN&gt;=&lt;/SPAN&gt; original_method(&lt;SPAN&gt;self&lt;/SPAN&gt;, &lt;SPAN&gt;*&lt;/SPAN&gt;args, &lt;SPAN&gt;*&lt;/SPAN&gt;&lt;SPAN&gt;*&lt;/SPAN&gt;kwargs) &lt;SPAN&gt;31&lt;/SPAN&gt; call_succeeded &lt;SPAN&gt;=&lt;/SPAN&gt; &lt;SPAN class=""&gt;True&lt;/SPAN&gt; &lt;SPAN&gt;32&lt;/SPAN&gt; &lt;SPAN class=""&gt;return&lt;/SPAN&gt; result&lt;/DIV&gt;&lt;DIV class=""&gt;File &lt;SPAN class=""&gt;/databricks/python/lib/python3.12/site-packages/mlflow/utils/autologging_utils/safety.py:483&lt;/SPAN&gt;, in &lt;SPAN class=""&gt;safe_patch.&amp;lt;locals&amp;gt;.safe_patch_function&lt;/SPAN&gt;&lt;SPAN class=""&gt;(*args, **kwargs)&lt;/SPAN&gt; &lt;SPAN&gt;479&lt;/SPAN&gt; call_original &lt;SPAN&gt;=&lt;/SPAN&gt; update_wrapper_extended(call_original, original) &lt;SPAN&gt;481&lt;/SPAN&gt; event_logger&lt;SPAN&gt;.&lt;/SPAN&gt;log_patch_function_start(args, kwargs) &lt;SPAN class=""&gt;--&amp;gt; 483&lt;/SPAN&gt; patch_function(call_original, &lt;SPAN&gt;*&lt;/SPAN&gt;args, &lt;SPAN&gt;*&lt;/SPAN&gt;&lt;SPAN&gt;*&lt;/SPAN&gt;kwargs) &lt;SPAN&gt;485&lt;/SPAN&gt; session&lt;SPAN&gt;.&lt;/SPAN&gt;state &lt;SPAN&gt;=&lt;/SPAN&gt; &lt;SPAN&gt;"&lt;/SPAN&gt;&lt;SPAN&gt;succeeded&lt;/SPAN&gt;&lt;SPAN&gt;"&lt;/SPAN&gt; &lt;SPAN&gt;486&lt;/SPAN&gt; event_logger&lt;SPAN&gt;.&lt;/SPAN&gt;log_patch_function_success(args, kwargs)&lt;/DIV&gt;&lt;DIV class=""&gt;File &lt;SPAN class=""&gt;/databricks/python/lib/python3.12/site-packages/mlflow/utils/autologging_utils/safety.py:182&lt;/SPAN&gt;, in &lt;SPAN class=""&gt;with_managed_run.&amp;lt;locals&amp;gt;.patch_with_managed_run&lt;/SPAN&gt;&lt;SPAN class=""&gt;(original, *args, **kwargs)&lt;/SPAN&gt; &lt;SPAN&gt;179&lt;/SPAN&gt; managed_run &lt;SPAN&gt;=&lt;/SPAN&gt; create_managed_run() &lt;SPAN&gt;181&lt;/SPAN&gt; &lt;SPAN class=""&gt;try&lt;/SPAN&gt;: &lt;SPAN class=""&gt;--&amp;gt; 182&lt;/SPAN&gt; result &lt;SPAN&gt;=&lt;/SPAN&gt; patch_function(original, &lt;SPAN&gt;*&lt;/SPAN&gt;args, &lt;SPAN&gt;*&lt;/SPAN&gt;&lt;SPAN&gt;*&lt;/SPAN&gt;kwargs) &lt;SPAN&gt;183&lt;/SPAN&gt; &lt;SPAN class=""&gt;except&lt;/SPAN&gt; (&lt;SPAN class=""&gt;Exception&lt;/SPAN&gt;, &lt;SPAN class=""&gt;KeyboardInterrupt&lt;/SPAN&gt;&lt;span class="lia-unicode-emoji" title=":disappointed_face:"&gt;😞&lt;/span&gt; &lt;SPAN&gt;184&lt;/SPAN&gt; &lt;SPAN&gt;# In addition to standard Python exceptions, handle keyboard interrupts to ensure&lt;/SPAN&gt; &lt;SPAN&gt;185&lt;/SPAN&gt; &lt;SPAN&gt;# that runs are terminated if a user prematurely interrupts training execution&lt;/SPAN&gt; &lt;SPAN&gt;186&lt;/SPAN&gt; &lt;SPAN&gt;# (e.g. via sigint / ctrl-c)&lt;/SPAN&gt; &lt;SPAN&gt;187&lt;/SPAN&gt; &lt;SPAN class=""&gt;if&lt;/SPAN&gt; managed_run:&lt;/DIV&gt;&lt;DIV class=""&gt;File &lt;SPAN class=""&gt;/databricks/python/lib/python3.12/site-packages/mlflow/pyspark/ml/__init__.py:1172&lt;/SPAN&gt;, in &lt;SPAN class=""&gt;autolog.&amp;lt;locals&amp;gt;.patched_fit&lt;/SPAN&gt;&lt;SPAN class=""&gt;(original, self, *args, **kwargs)&lt;/SPAN&gt; &lt;SPAN&gt;1170&lt;/SPAN&gt; &lt;SPAN class=""&gt;if&lt;/SPAN&gt; t&lt;SPAN&gt;.&lt;/SPAN&gt;should_log(): &lt;SPAN&gt;1171&lt;/SPAN&gt; &lt;SPAN class=""&gt;with&lt;/SPAN&gt; _AUTOLOGGING_METRICS_MANAGER&lt;SPAN&gt;.&lt;/SPAN&gt;disable_log_post_training_metrics(): &lt;SPAN class=""&gt;-&amp;gt; 1172&lt;/SPAN&gt; fit_result &lt;SPAN&gt;=&lt;/SPAN&gt; fit_mlflow(original, &lt;SPAN&gt;self&lt;/SPAN&gt;, &lt;SPAN&gt;*&lt;/SPAN&gt;args, &lt;SPAN&gt;*&lt;/SPAN&gt;&lt;SPAN&gt;*&lt;/SPAN&gt;kwargs) &lt;SPAN&gt;1173&lt;/SPAN&gt; &lt;SPAN&gt;# In some cases the `fit_result` may be an iterator of spark models.&lt;/SPAN&gt; &lt;SPAN&gt;1174&lt;/SPAN&gt; &lt;SPAN class=""&gt;if&lt;/SPAN&gt; should_log_post_training_metrics &lt;SPAN class=""&gt;and&lt;/SPAN&gt; &lt;SPAN&gt;isinstance&lt;/SPAN&gt;(fit_result, Model):&lt;/DIV&gt;&lt;DIV class=""&gt;File &lt;SPAN class=""&gt;/databricks/python/lib/python3.12/site-packages/mlflow/pyspark/ml/__init__.py:1158&lt;/SPAN&gt;, in &lt;SPAN class=""&gt;autolog.&amp;lt;locals&amp;gt;.fit_mlflow&lt;/SPAN&gt;&lt;SPAN class=""&gt;(original, self, *args, **kwargs)&lt;/SPAN&gt; &lt;SPAN&gt;1156&lt;/SPAN&gt; input_training_df &lt;SPAN&gt;=&lt;/SPAN&gt; args[&lt;SPAN&gt;0&lt;/SPAN&gt;]&lt;SPAN&gt;.&lt;/SPAN&gt;persist(StorageLevel&lt;SPAN&gt;.&lt;/SPAN&gt;MEMORY_AND_DISK) &lt;SPAN&gt;1157&lt;/SPAN&gt; _log_pretraining_metadata(estimator, params, input_training_df) &lt;SPAN class=""&gt;-&amp;gt; 1158&lt;/SPAN&gt; spark_model &lt;SPAN&gt;=&lt;/SPAN&gt; original(&lt;SPAN&gt;self&lt;/SPAN&gt;, &lt;SPAN&gt;*&lt;/SPAN&gt;args, &lt;SPAN&gt;*&lt;/SPAN&gt;&lt;SPAN&gt;*&lt;/SPAN&gt;kwargs) &lt;SPAN&gt;1159&lt;/SPAN&gt; _log_posttraining_metadata(estimator, spark_model, params, input_training_df) &lt;SPAN&gt;1160&lt;/SPAN&gt; input_training_df&lt;SPAN&gt;.&lt;/SPAN&gt;unpersist()&lt;/DIV&gt;&lt;DIV class=""&gt;File &lt;SPAN class=""&gt;/databricks/python/lib/python3.12/site-packages/mlflow/utils/autologging_utils/safety.py:474&lt;/SPAN&gt;, in &lt;SPAN class=""&gt;safe_patch.&amp;lt;locals&amp;gt;.safe_patch_function.&amp;lt;locals&amp;gt;.call_original&lt;/SPAN&gt;&lt;SPAN class=""&gt;(*og_args, **og_kwargs)&lt;/SPAN&gt; &lt;SPAN&gt;471&lt;/SPAN&gt; original_result &lt;SPAN&gt;=&lt;/SPAN&gt; original(&lt;SPAN&gt;*&lt;/SPAN&gt;_og_args, &lt;SPAN&gt;*&lt;/SPAN&gt;&lt;SPAN&gt;*&lt;/SPAN&gt;_og_kwargs) &lt;SPAN&gt;472&lt;/SPAN&gt; &lt;SPAN class=""&gt;return&lt;/SPAN&gt; original_result &lt;SPAN class=""&gt;--&amp;gt; 474&lt;/SPAN&gt; &lt;SPAN class=""&gt;return&lt;/SPAN&gt; call_original_fn_with_event_logging(_original_fn, og_args, og_kwargs)&lt;/DIV&gt;&lt;DIV class=""&gt;File &lt;SPAN class=""&gt;/databricks/python/lib/python3.12/site-packages/mlflow/utils/autologging_utils/safety.py:425&lt;/SPAN&gt;, in &lt;SPAN class=""&gt;safe_patch.&amp;lt;locals&amp;gt;.safe_patch_function.&amp;lt;locals&amp;gt;.call_original_fn_with_event_logging&lt;/SPAN&gt;&lt;SPAN class=""&gt;(original_fn, og_args, og_kwargs)&lt;/SPAN&gt; &lt;SPAN&gt;422&lt;/SPAN&gt; &lt;SPAN class=""&gt;try&lt;/SPAN&gt;: &lt;SPAN&gt;423&lt;/SPAN&gt; event_logger&lt;SPAN&gt;.&lt;/SPAN&gt;log_original_function_start(og_args, og_kwargs) &lt;SPAN class=""&gt;--&amp;gt; 425&lt;/SPAN&gt; original_fn_result &lt;SPAN&gt;=&lt;/SPAN&gt; original_fn(&lt;SPAN&gt;*&lt;/SPAN&gt;og_args, &lt;SPAN&gt;*&lt;/SPAN&gt;&lt;SPAN&gt;*&lt;/SPAN&gt;og_kwargs) &lt;SPAN&gt;427&lt;/SPAN&gt; event_logger&lt;SPAN&gt;.&lt;/SPAN&gt;log_original_function_success(og_args, og_kwargs) &lt;SPAN&gt;428&lt;/SPAN&gt; &lt;SPAN class=""&gt;return&lt;/SPAN&gt; original_fn_result&lt;/DIV&gt;&lt;DIV class=""&gt;File &lt;SPAN class=""&gt;/databricks/python/lib/python3.12/site-packages/mlflow/utils/autologging_utils/safety.py:471&lt;/SPAN&gt;, in &lt;SPAN class=""&gt;safe_patch.&amp;lt;locals&amp;gt;.safe_patch_function.&amp;lt;locals&amp;gt;.call_original.&amp;lt;locals&amp;gt;._original_fn&lt;/SPAN&gt;&lt;SPAN class=""&gt;(*_og_args, **_og_kwargs)&lt;/SPAN&gt; &lt;SPAN&gt;463&lt;/SPAN&gt; &lt;SPAN&gt;# Show all non-MLflow warnings as normal (i.e. not as event logs)&lt;/SPAN&gt; &lt;SPAN&gt;464&lt;/SPAN&gt; &lt;SPAN&gt;# during original function execution, even if silent mode is enabled&lt;/SPAN&gt; &lt;SPAN&gt;465&lt;/SPAN&gt; &lt;SPAN&gt;# (`silent=True`), since these warnings originate from the ML framework&lt;/SPAN&gt; &lt;SPAN&gt;466&lt;/SPAN&gt; &lt;SPAN&gt;# or one of its dependencies and are likely relevant to the caller&lt;/SPAN&gt; &lt;SPAN&gt;467&lt;/SPAN&gt; &lt;SPAN class=""&gt;with&lt;/SPAN&gt; NonMlflowWarningsBehaviorForCurrentThread( &lt;SPAN&gt;468&lt;/SPAN&gt; disable_warnings&lt;SPAN&gt;=&lt;/SPAN&gt;&lt;SPAN class=""&gt;False&lt;/SPAN&gt;, &lt;SPAN&gt;469&lt;/SPAN&gt; reroute_warnings&lt;SPAN&gt;=&lt;/SPAN&gt;&lt;SPAN class=""&gt;False&lt;/SPAN&gt;, &lt;SPAN&gt;470&lt;/SPAN&gt; &lt;span class="lia-unicode-emoji" title=":disappointed_face:"&gt;😞&lt;/span&gt; &lt;SPAN class=""&gt;--&amp;gt; 471&lt;/SPAN&gt; original_result &lt;SPAN&gt;=&lt;/SPAN&gt; original(&lt;SPAN&gt;*&lt;/SPAN&gt;_og_args, &lt;SPAN&gt;*&lt;/SPAN&gt;&lt;SPAN&gt;*&lt;/SPAN&gt;_og_kwargs) &lt;SPAN&gt;472&lt;/SPAN&gt; &lt;SPAN class=""&gt;return&lt;/SPAN&gt; original_result&lt;/DIV&gt;&lt;DIV class=""&gt;File &lt;SPAN class=""&gt;/databricks/spark/python/pyspark/ml/base.py:203&lt;/SPAN&gt;, in &lt;SPAN class=""&gt;Estimator.fit&lt;/SPAN&gt;&lt;SPAN class=""&gt;(self, dataset, params)&lt;/SPAN&gt; &lt;SPAN&gt;201&lt;/SPAN&gt; &lt;SPAN class=""&gt;return&lt;/SPAN&gt; &lt;SPAN&gt;self&lt;/SPAN&gt;&lt;SPAN&gt;.&lt;/SPAN&gt;copy(params)&lt;SPAN&gt;.&lt;/SPAN&gt;_fit(dataset) &lt;SPAN&gt;202&lt;/SPAN&gt; &lt;SPAN class=""&gt;else&lt;/SPAN&gt;: &lt;SPAN class=""&gt;--&amp;gt; 203&lt;/SPAN&gt; &lt;SPAN class=""&gt;return&lt;/SPAN&gt; &lt;SPAN&gt;self&lt;/SPAN&gt;&lt;SPAN&gt;.&lt;/SPAN&gt;_fit(dataset) &lt;SPAN&gt;204&lt;/SPAN&gt; &lt;SPAN class=""&gt;else&lt;/SPAN&gt;: &lt;SPAN&gt;205&lt;/SPAN&gt; &lt;SPAN class=""&gt;raise&lt;/SPAN&gt; &lt;SPAN class=""&gt;TypeError&lt;/SPAN&gt;( &lt;SPAN&gt;206&lt;/SPAN&gt; &lt;SPAN&gt;"&lt;/SPAN&gt;&lt;SPAN&gt;Params must be either a param map or a list/tuple of param maps, &lt;/SPAN&gt;&lt;SPAN&gt;"&lt;/SPAN&gt; &lt;SPAN&gt;207&lt;/SPAN&gt; &lt;SPAN&gt;"&lt;/SPAN&gt;&lt;SPAN&gt;but got &lt;/SPAN&gt;&lt;SPAN class=""&gt;%s&lt;/SPAN&gt;&lt;SPAN&gt;.&lt;/SPAN&gt;&lt;SPAN&gt;"&lt;/SPAN&gt; &lt;SPAN&gt;%&lt;/SPAN&gt; &lt;SPAN&gt;type&lt;/SPAN&gt;(params) &lt;SPAN&gt;208&lt;/SPAN&gt; )&lt;/DIV&gt;&lt;DIV class=""&gt;File &lt;SPAN class=""&gt;/databricks/spark/python/pyspark/ml/pipeline.py:136&lt;/SPAN&gt;, in &lt;SPAN class=""&gt;Pipeline._fit&lt;/SPAN&gt;&lt;SPAN class=""&gt;(self, dataset)&lt;/SPAN&gt; &lt;SPAN&gt;134&lt;/SPAN&gt; dataset &lt;SPAN&gt;=&lt;/SPAN&gt; stage&lt;SPAN&gt;.&lt;/SPAN&gt;transform(dataset) &lt;SPAN&gt;135&lt;/SPAN&gt; &lt;SPAN class=""&gt;else&lt;/SPAN&gt;: &lt;SPAN&gt;# must be an Estimator&lt;/SPAN&gt; &lt;SPAN class=""&gt;--&amp;gt; 136&lt;/SPAN&gt; model &lt;SPAN&gt;=&lt;/SPAN&gt; stage&lt;SPAN&gt;.&lt;/SPAN&gt;fit(dataset) &lt;SPAN&gt;137&lt;/SPAN&gt; transformers&lt;SPAN&gt;.&lt;/SPAN&gt;append(model) &lt;SPAN&gt;138&lt;/SPAN&gt; &lt;SPAN class=""&gt;if&lt;/SPAN&gt; i &lt;SPAN&gt;&amp;lt;&lt;/SPAN&gt; indexOfLastEstimator:&lt;/DIV&gt;&lt;DIV class=""&gt;File &lt;SPAN class=""&gt;/databricks/python_shell/lib/dbruntime/MLWorkloadsInstrumentation/_pyspark.py:30&lt;/SPAN&gt;, in &lt;SPAN class=""&gt;_create_patch_function.&amp;lt;locals&amp;gt;.patched_method&lt;/SPAN&gt;&lt;SPAN class=""&gt;(self, *args, **kwargs)&lt;/SPAN&gt; &lt;SPAN&gt;28&lt;/SPAN&gt; call_succeeded &lt;SPAN&gt;=&lt;/SPAN&gt; &lt;SPAN class=""&gt;False&lt;/SPAN&gt; &lt;SPAN&gt;29&lt;/SPAN&gt; &lt;SPAN class=""&gt;try&lt;/SPAN&gt;: &lt;SPAN class=""&gt;---&amp;gt; 30&lt;/SPAN&gt; result &lt;SPAN&gt;=&lt;/SPAN&gt; original_method(&lt;SPAN&gt;self&lt;/SPAN&gt;, &lt;SPAN&gt;*&lt;/SPAN&gt;args, &lt;SPAN&gt;*&lt;/SPAN&gt;&lt;SPAN&gt;*&lt;/SPAN&gt;kwargs) &lt;SPAN&gt;31&lt;/SPAN&gt; call_succeeded &lt;SPAN&gt;=&lt;/SPAN&gt; &lt;SPAN class=""&gt;True&lt;/SPAN&gt; &lt;SPAN&gt;32&lt;/SPAN&gt; &lt;SPAN class=""&gt;return&lt;/SPAN&gt; result&lt;/DIV&gt;&lt;DIV class=""&gt;File &lt;SPAN class=""&gt;/databricks/python/lib/python3.12/site-packages/mlflow/utils/autologging_utils/safety.py:483&lt;/SPAN&gt;, in &lt;SPAN class=""&gt;safe_patch.&amp;lt;locals&amp;gt;.safe_patch_function&lt;/SPAN&gt;&lt;SPAN class=""&gt;(*args, **kwargs)&lt;/SPAN&gt; &lt;SPAN&gt;479&lt;/SPAN&gt; call_original &lt;SPAN&gt;=&lt;/SPAN&gt; update_wrapper_extended(call_original, original) &lt;SPAN&gt;481&lt;/SPAN&gt; event_logger&lt;SPAN&gt;.&lt;/SPAN&gt;log_patch_function_start(args, kwargs) &lt;SPAN class=""&gt;--&amp;gt; 483&lt;/SPAN&gt; patch_function(call_original, &lt;SPAN&gt;*&lt;/SPAN&gt;args, &lt;SPAN&gt;*&lt;/SPAN&gt;&lt;SPAN&gt;*&lt;/SPAN&gt;kwargs) &lt;SPAN&gt;485&lt;/SPAN&gt; session&lt;SPAN&gt;.&lt;/SPAN&gt;state &lt;SPAN&gt;=&lt;/SPAN&gt; &lt;SPAN&gt;"&lt;/SPAN&gt;&lt;SPAN&gt;succeeded&lt;/SPAN&gt;&lt;SPAN&gt;"&lt;/SPAN&gt; &lt;SPAN&gt;486&lt;/SPAN&gt; event_logger&lt;SPAN&gt;.&lt;/SPAN&gt;log_patch_function_success(args, kwargs)&lt;/DIV&gt;&lt;DIV class=""&gt;File &lt;SPAN class=""&gt;/databricks/python/lib/python3.12/site-packages/mlflow/utils/autologging_utils/safety.py:182&lt;/SPAN&gt;, in &lt;SPAN class=""&gt;with_managed_run.&amp;lt;locals&amp;gt;.patch_with_managed_run&lt;/SPAN&gt;&lt;SPAN class=""&gt;(original, *args, **kwargs)&lt;/SPAN&gt; &lt;SPAN&gt;179&lt;/SPAN&gt; managed_run &lt;SPAN&gt;=&lt;/SPAN&gt; create_managed_run() &lt;SPAN&gt;181&lt;/SPAN&gt; &lt;SPAN class=""&gt;try&lt;/SPAN&gt;: &lt;SPAN class=""&gt;--&amp;gt; 182&lt;/SPAN&gt; result &lt;SPAN&gt;=&lt;/SPAN&gt; patch_function(original, &lt;SPAN&gt;*&lt;/SPAN&gt;args, &lt;SPAN&gt;*&lt;/SPAN&gt;&lt;SPAN&gt;*&lt;/SPAN&gt;kwargs) &lt;SPAN&gt;183&lt;/SPAN&gt; &lt;SPAN class=""&gt;except&lt;/SPAN&gt; (&lt;SPAN class=""&gt;Exception&lt;/SPAN&gt;, &lt;SPAN class=""&gt;KeyboardInterrupt&lt;/SPAN&gt;&lt;span class="lia-unicode-emoji" title=":disappointed_face:"&gt;😞&lt;/span&gt; &lt;SPAN&gt;184&lt;/SPAN&gt; &lt;SPAN&gt;# In addition to standard Python exceptions, handle keyboard interrupts to ensure&lt;/SPAN&gt; &lt;SPAN&gt;185&lt;/SPAN&gt; &lt;SPAN&gt;# that runs are terminated if a user prematurely interrupts training execution&lt;/SPAN&gt; &lt;SPAN&gt;186&lt;/SPAN&gt; &lt;SPAN&gt;# (e.g. via sigint / ctrl-c)&lt;/SPAN&gt; &lt;SPAN&gt;187&lt;/SPAN&gt; &lt;SPAN class=""&gt;if&lt;/SPAN&gt; managed_run:&lt;/DIV&gt;&lt;DIV class=""&gt;File &lt;SPAN class=""&gt;/databricks/python/lib/python3.12/site-packages/mlflow/pyspark/ml/__init__.py:1180&lt;/SPAN&gt;, in &lt;SPAN class=""&gt;autolog.&amp;lt;locals&amp;gt;.patched_fit&lt;/SPAN&gt;&lt;SPAN class=""&gt;(original, self, *args, **kwargs)&lt;/SPAN&gt; &lt;SPAN&gt;1178&lt;/SPAN&gt; &lt;SPAN class=""&gt;return&lt;/SPAN&gt; fit_result &lt;SPAN&gt;1179&lt;/SPAN&gt; &lt;SPAN class=""&gt;else&lt;/SPAN&gt;: &lt;SPAN class=""&gt;-&amp;gt; 1180&lt;/SPAN&gt; &lt;SPAN class=""&gt;return&lt;/SPAN&gt; original(&lt;SPAN&gt;self&lt;/SPAN&gt;, &lt;SPAN&gt;*&lt;/SPAN&gt;args, &lt;SPAN&gt;*&lt;/SPAN&gt;&lt;SPAN&gt;*&lt;/SPAN&gt;kwargs)&lt;/DIV&gt;&lt;DIV class=""&gt;File &lt;SPAN class=""&gt;/databricks/python/lib/python3.12/site-packages/mlflow/utils/autologging_utils/safety.py:474&lt;/SPAN&gt;, in &lt;SPAN class=""&gt;safe_patch.&amp;lt;locals&amp;gt;.safe_patch_function.&amp;lt;locals&amp;gt;.call_original&lt;/SPAN&gt;&lt;SPAN class=""&gt;(*og_args, **og_kwargs)&lt;/SPAN&gt; &lt;SPAN&gt;471&lt;/SPAN&gt; original_result &lt;SPAN&gt;=&lt;/SPAN&gt; original(&lt;SPAN&gt;*&lt;/SPAN&gt;_og_args, &lt;SPAN&gt;*&lt;/SPAN&gt;&lt;SPAN&gt;*&lt;/SPAN&gt;_og_kwargs) &lt;SPAN&gt;472&lt;/SPAN&gt; &lt;SPAN class=""&gt;return&lt;/SPAN&gt; original_result &lt;SPAN class=""&gt;--&amp;gt; 474&lt;/SPAN&gt; &lt;SPAN class=""&gt;return&lt;/SPAN&gt; call_original_fn_with_event_logging(_original_fn, og_args, og_kwargs)&lt;/DIV&gt;&lt;DIV class=""&gt;File &lt;SPAN class=""&gt;/databricks/python/lib/python3.12/site-packages/mlflow/utils/autologging_utils/safety.py:425&lt;/SPAN&gt;, in &lt;SPAN class=""&gt;safe_patch.&amp;lt;locals&amp;gt;.safe_patch_function.&amp;lt;locals&amp;gt;.call_original_fn_with_event_logging&lt;/SPAN&gt;&lt;SPAN class=""&gt;(original_fn, og_args, og_kwargs)&lt;/SPAN&gt; &lt;SPAN&gt;422&lt;/SPAN&gt; &lt;SPAN class=""&gt;try&lt;/SPAN&gt;: &lt;SPAN&gt;423&lt;/SPAN&gt; event_logger&lt;SPAN&gt;.&lt;/SPAN&gt;log_original_function_start(og_args, og_kwargs) &lt;SPAN class=""&gt;--&amp;gt; 425&lt;/SPAN&gt; original_fn_result &lt;SPAN&gt;=&lt;/SPAN&gt; original_fn(&lt;SPAN&gt;*&lt;/SPAN&gt;og_args, &lt;SPAN&gt;*&lt;/SPAN&gt;&lt;SPAN&gt;*&lt;/SPAN&gt;og_kwargs) &lt;SPAN&gt;427&lt;/SPAN&gt; event_logger&lt;SPAN&gt;.&lt;/SPAN&gt;log_original_function_success(og_args, og_kwargs) &lt;SPAN&gt;428&lt;/SPAN&gt; &lt;SPAN class=""&gt;return&lt;/SPAN&gt; original_fn_result&lt;/DIV&gt;&lt;DIV class=""&gt;File &lt;SPAN class=""&gt;/databricks/python/lib/python3.12/site-packages/mlflow/utils/autologging_utils/safety.py:471&lt;/SPAN&gt;, in &lt;SPAN class=""&gt;safe_patch.&amp;lt;locals&amp;gt;.safe_patch_function.&amp;lt;locals&amp;gt;.call_original.&amp;lt;locals&amp;gt;._original_fn&lt;/SPAN&gt;&lt;SPAN class=""&gt;(*_og_args, **_og_kwargs)&lt;/SPAN&gt; &lt;SPAN&gt;463&lt;/SPAN&gt; &lt;SPAN&gt;# Show all non-MLflow warnings as normal (i.e. not as event logs)&lt;/SPAN&gt; &lt;SPAN&gt;464&lt;/SPAN&gt; &lt;SPAN&gt;# during original function execution, even if silent mode is enabled&lt;/SPAN&gt; &lt;SPAN&gt;465&lt;/SPAN&gt; &lt;SPAN&gt;# (`silent=True`), since these warnings originate from the ML framework&lt;/SPAN&gt; &lt;SPAN&gt;466&lt;/SPAN&gt; &lt;SPAN&gt;# or one of its dependencies and are likely relevant to the caller&lt;/SPAN&gt; &lt;SPAN&gt;467&lt;/SPAN&gt; &lt;SPAN class=""&gt;with&lt;/SPAN&gt; NonMlflowWarningsBehaviorForCurrentThread( &lt;SPAN&gt;468&lt;/SPAN&gt; disable_warnings&lt;SPAN&gt;=&lt;/SPAN&gt;&lt;SPAN class=""&gt;False&lt;/SPAN&gt;, &lt;SPAN&gt;469&lt;/SPAN&gt; reroute_warnings&lt;SPAN&gt;=&lt;/SPAN&gt;&lt;SPAN class=""&gt;False&lt;/SPAN&gt;, &lt;SPAN&gt;470&lt;/SPAN&gt; &lt;span class="lia-unicode-emoji" title=":disappointed_face:"&gt;😞&lt;/span&gt; &lt;SPAN class=""&gt;--&amp;gt; 471&lt;/SPAN&gt; original_result &lt;SPAN&gt;=&lt;/SPAN&gt; original(&lt;SPAN&gt;*&lt;/SPAN&gt;_og_args, &lt;SPAN&gt;*&lt;/SPAN&gt;&lt;SPAN&gt;*&lt;/SPAN&gt;_og_kwargs) &lt;SPAN&gt;472&lt;/SPAN&gt; &lt;SPAN class=""&gt;return&lt;/SPAN&gt; original_result&lt;/DIV&gt;&lt;DIV class=""&gt;File &lt;SPAN class=""&gt;/databricks/spark/python/pyspark/ml/base.py:203&lt;/SPAN&gt;, in &lt;SPAN class=""&gt;Estimator.fit&lt;/SPAN&gt;&lt;SPAN class=""&gt;(self, dataset, params)&lt;/SPAN&gt; &lt;SPAN&gt;201&lt;/SPAN&gt; &lt;SPAN class=""&gt;return&lt;/SPAN&gt; &lt;SPAN&gt;self&lt;/SPAN&gt;&lt;SPAN&gt;.&lt;/SPAN&gt;copy(params)&lt;SPAN&gt;.&lt;/SPAN&gt;_fit(dataset) &lt;SPAN&gt;202&lt;/SPAN&gt; &lt;SPAN class=""&gt;else&lt;/SPAN&gt;: &lt;SPAN class=""&gt;--&amp;gt; 203&lt;/SPAN&gt; &lt;SPAN class=""&gt;return&lt;/SPAN&gt; &lt;SPAN&gt;self&lt;/SPAN&gt;&lt;SPAN&gt;.&lt;/SPAN&gt;_fit(dataset) &lt;SPAN&gt;204&lt;/SPAN&gt; &lt;SPAN class=""&gt;else&lt;/SPAN&gt;: &lt;SPAN&gt;205&lt;/SPAN&gt; &lt;SPAN class=""&gt;raise&lt;/SPAN&gt; &lt;SPAN class=""&gt;TypeError&lt;/SPAN&gt;( &lt;SPAN&gt;206&lt;/SPAN&gt; &lt;SPAN&gt;"&lt;/SPAN&gt;&lt;SPAN&gt;Params must be either a param map or a list/tuple of param maps, &lt;/SPAN&gt;&lt;SPAN&gt;"&lt;/SPAN&gt; &lt;SPAN&gt;207&lt;/SPAN&gt; &lt;SPAN&gt;"&lt;/SPAN&gt;&lt;SPAN&gt;but got &lt;/SPAN&gt;&lt;SPAN class=""&gt;%s&lt;/SPAN&gt;&lt;SPAN&gt;.&lt;/SPAN&gt;&lt;SPAN&gt;"&lt;/SPAN&gt; &lt;SPAN&gt;%&lt;/SPAN&gt; &lt;SPAN&gt;type&lt;/SPAN&gt;(params) &lt;SPAN&gt;208&lt;/SPAN&gt; )&lt;/DIV&gt;&lt;DIV class=""&gt;File &lt;SPAN class=""&gt;/databricks/python/lib/python3.12/site-packages/xgboost/spark/core.py:1136&lt;/SPAN&gt;, in &lt;SPAN class=""&gt;_SparkXGBEstimator._fit&lt;/SPAN&gt;&lt;SPAN class=""&gt;(self, dataset)&lt;/SPAN&gt; &lt;SPAN&gt;1123&lt;/SPAN&gt; &lt;SPAN class=""&gt;return&lt;/SPAN&gt; ret[&lt;SPAN&gt;0&lt;/SPAN&gt;], ret[&lt;SPAN&gt;1&lt;/SPAN&gt;] &lt;SPAN&gt;1125&lt;/SPAN&gt; get_logger(&lt;SPAN&gt;"&lt;/SPAN&gt;&lt;SPAN&gt;XGBoost-PySpark&lt;/SPAN&gt;&lt;SPAN&gt;"&lt;/SPAN&gt;)&lt;SPAN&gt;.&lt;/SPAN&gt;info( &lt;SPAN&gt;1126&lt;/SPAN&gt; &lt;SPAN&gt;"&lt;/SPAN&gt;&lt;SPAN&gt;Running xgboost-&lt;/SPAN&gt;&lt;SPAN class=""&gt;%s&lt;/SPAN&gt;&lt;SPAN&gt; on &lt;/SPAN&gt;&lt;SPAN class=""&gt;%s&lt;/SPAN&gt;&lt;SPAN&gt; workers with&lt;/SPAN&gt;&lt;SPAN&gt;"&lt;/SPAN&gt; &lt;SPAN&gt;1127&lt;/SPAN&gt; &lt;SPAN&gt;"&lt;/SPAN&gt;&lt;SPAN class=""&gt;\n&lt;/SPAN&gt;&lt;SPAN class=""&gt;\t&lt;/SPAN&gt;&lt;SPAN&gt;booster params: &lt;/SPAN&gt;&lt;SPAN class=""&gt;%s&lt;/SPAN&gt;&lt;SPAN&gt;"&lt;/SPAN&gt; &lt;SPAN class=""&gt;(...)&lt;/SPAN&gt; &lt;SPAN&gt;1134&lt;/SPAN&gt; dmatrix_kwargs, &lt;SPAN&gt;1135&lt;/SPAN&gt; ) &lt;SPAN class=""&gt;-&amp;gt; 1136&lt;/SPAN&gt; (config, booster) &lt;SPAN&gt;=&lt;/SPAN&gt; _run_job() &lt;SPAN&gt;1137&lt;/SPAN&gt; get_logger(&lt;SPAN&gt;"&lt;/SPAN&gt;&lt;SPAN&gt;XGBoost-PySpark&lt;/SPAN&gt;&lt;SPAN&gt;"&lt;/SPAN&gt;)&lt;SPAN&gt;.&lt;/SPAN&gt;info(&lt;SPAN&gt;"&lt;/SPAN&gt;&lt;SPAN&gt;Finished xgboost training!&lt;/SPAN&gt;&lt;SPAN&gt;"&lt;/SPAN&gt;) &lt;SPAN&gt;1139&lt;/SPAN&gt; result_xgb_model &lt;SPAN&gt;=&lt;/SPAN&gt; &lt;SPAN&gt;self&lt;/SPAN&gt;&lt;SPAN&gt;.&lt;/SPAN&gt;_convert_to_sklearn_model( &lt;SPAN&gt;1140&lt;/SPAN&gt; &lt;SPAN&gt;bytearray&lt;/SPAN&gt;(booster, &lt;SPAN&gt;"&lt;/SPAN&gt;&lt;SPAN&gt;utf-8&lt;/SPAN&gt;&lt;SPAN&gt;"&lt;/SPAN&gt;), config &lt;SPAN&gt;1141&lt;/SPAN&gt; )&lt;/DIV&gt;&lt;DIV class=""&gt;File &lt;SPAN class=""&gt;/databricks/python/lib/python3.12/site-packages/xgboost/spark/core.py:1122&lt;/SPAN&gt;, in &lt;SPAN class=""&gt;_SparkXGBEstimator._fit.&amp;lt;locals&amp;gt;._run_job&lt;/SPAN&gt;&lt;SPAN class=""&gt;()&lt;/SPAN&gt; &lt;SPAN&gt;1113&lt;/SPAN&gt; rdd &lt;SPAN&gt;=&lt;/SPAN&gt; ( &lt;SPAN&gt;1114&lt;/SPAN&gt; dataset&lt;SPAN&gt;.&lt;/SPAN&gt;mapInPandas( &lt;SPAN&gt;1115&lt;/SPAN&gt; _train_booster, &lt;SPAN&gt;# type: ignore&lt;/SPAN&gt; &lt;SPAN class=""&gt;(...)&lt;/SPAN&gt; &lt;SPAN&gt;1119&lt;/SPAN&gt; &lt;SPAN&gt;.&lt;/SPAN&gt;mapPartitions(&lt;SPAN class=""&gt;lambda&lt;/SPAN&gt; x: x) &lt;SPAN&gt;1120&lt;/SPAN&gt; ) &lt;SPAN&gt;1121&lt;/SPAN&gt; rdd_with_resource &lt;SPAN&gt;=&lt;/SPAN&gt; &lt;SPAN&gt;self&lt;/SPAN&gt;&lt;SPAN&gt;.&lt;/SPAN&gt;_try_stage_level_scheduling(rdd) &lt;SPAN class=""&gt;-&amp;gt; 1122&lt;/SPAN&gt; ret &lt;SPAN&gt;=&lt;/SPAN&gt; rdd_with_resource&lt;SPAN&gt;.&lt;/SPAN&gt;collect()[&lt;SPAN&gt;0&lt;/SPAN&gt;] &lt;SPAN&gt;1123&lt;/SPAN&gt; &lt;SPAN class=""&gt;return&lt;/SPAN&gt; ret[&lt;SPAN&gt;0&lt;/SPAN&gt;], ret[&lt;SPAN&gt;1&lt;/SPAN&gt;]&lt;/DIV&gt;&lt;DIV class=""&gt;File &lt;SPAN class=""&gt;/databricks/spark/python/pyspark/instrumentation_utils.py:47&lt;/SPAN&gt;, in &lt;SPAN class=""&gt;_wrap_function.&amp;lt;locals&amp;gt;.wrapper&lt;/SPAN&gt;&lt;SPAN class=""&gt;(*args, **kwargs)&lt;/SPAN&gt; &lt;SPAN&gt;45&lt;/SPAN&gt; start &lt;SPAN&gt;=&lt;/SPAN&gt; time&lt;SPAN&gt;.&lt;/SPAN&gt;perf_counter() &lt;SPAN&gt;46&lt;/SPAN&gt; &lt;SPAN class=""&gt;try&lt;/SPAN&gt;: &lt;SPAN class=""&gt;---&amp;gt; 47&lt;/SPAN&gt; res &lt;SPAN&gt;=&lt;/SPAN&gt; func(&lt;SPAN&gt;*&lt;/SPAN&gt;args, &lt;SPAN&gt;*&lt;/SPAN&gt;&lt;SPAN&gt;*&lt;/SPAN&gt;kwargs) &lt;SPAN&gt;48&lt;/SPAN&gt; logger&lt;SPAN&gt;.&lt;/SPAN&gt;log_success( &lt;SPAN&gt;49&lt;/SPAN&gt; module_name, class_name, function_name, time&lt;SPAN&gt;.&lt;/SPAN&gt;perf_counter() &lt;SPAN&gt;-&lt;/SPAN&gt; start, signature &lt;SPAN&gt;50&lt;/SPAN&gt; ) &lt;SPAN&gt;51&lt;/SPAN&gt; &lt;SPAN class=""&gt;return&lt;/SPAN&gt; res&lt;/DIV&gt;&lt;DIV class=""&gt;File &lt;SPAN class=""&gt;/databricks/spark/python/pyspark/core/rdd.py:1721&lt;/SPAN&gt;, in &lt;SPAN class=""&gt;RDD.collect&lt;/SPAN&gt;&lt;SPAN class=""&gt;(self)&lt;/SPAN&gt; &lt;SPAN&gt;1719&lt;/SPAN&gt; &lt;SPAN class=""&gt;with&lt;/SPAN&gt; SCCallSiteSync(&lt;SPAN&gt;self&lt;/SPAN&gt;&lt;SPAN&gt;.&lt;/SPAN&gt;context): &lt;SPAN&gt;1720&lt;/SPAN&gt; &lt;SPAN class=""&gt;assert&lt;/SPAN&gt; &lt;SPAN&gt;self&lt;/SPAN&gt;&lt;SPAN&gt;.&lt;/SPAN&gt;ctx&lt;SPAN&gt;.&lt;/SPAN&gt;_jvm &lt;SPAN class=""&gt;is&lt;/SPAN&gt; &lt;SPAN class=""&gt;not&lt;/SPAN&gt; &lt;SPAN class=""&gt;None&lt;/SPAN&gt; &lt;SPAN class=""&gt;-&amp;gt; 1721&lt;/SPAN&gt; sock_info &lt;SPAN&gt;=&lt;/SPAN&gt; &lt;SPAN&gt;self&lt;/SPAN&gt;&lt;SPAN&gt;.&lt;/SPAN&gt;ctx&lt;SPAN&gt;.&lt;/SPAN&gt;_jvm&lt;SPAN&gt;.&lt;/SPAN&gt;PythonRDD&lt;SPAN&gt;.&lt;/SPAN&gt;collectAndServe(&lt;SPAN&gt;self&lt;/SPAN&gt;&lt;SPAN&gt;.&lt;/SPAN&gt;_jrdd&lt;SPAN&gt;.&lt;/SPAN&gt;rdd()) &lt;SPAN&gt;1722&lt;/SPAN&gt; &lt;SPAN class=""&gt;return&lt;/SPAN&gt; &lt;SPAN&gt;list&lt;/SPAN&gt;(_load_from_socket(sock_info, &lt;SPAN&gt;self&lt;/SPAN&gt;&lt;SPAN&gt;.&lt;/SPAN&gt;_jrdd_deserializer))&lt;/DIV&gt;&lt;DIV class=""&gt;File &lt;SPAN class=""&gt;/databricks/spark/python/lib/py4j-0.10.9.9-src.zip/py4j/java_gateway.py:1362&lt;/SPAN&gt;, in &lt;SPAN class=""&gt;JavaMember.__call__&lt;/SPAN&gt;&lt;SPAN class=""&gt;(self, *args)&lt;/SPAN&gt; &lt;SPAN&gt;1356&lt;/SPAN&gt; command &lt;SPAN&gt;=&lt;/SPAN&gt; proto&lt;SPAN&gt;.&lt;/SPAN&gt;CALL_COMMAND_NAME &lt;SPAN&gt;+&lt;/SPAN&gt;\ &lt;SPAN&gt;1357&lt;/SPAN&gt; &lt;SPAN&gt;self&lt;/SPAN&gt;&lt;SPAN&gt;.&lt;/SPAN&gt;command_header &lt;SPAN&gt;+&lt;/SPAN&gt;\ &lt;SPAN&gt;1358&lt;/SPAN&gt; args_command &lt;SPAN&gt;+&lt;/SPAN&gt;\ &lt;SPAN&gt;1359&lt;/SPAN&gt; proto&lt;SPAN&gt;.&lt;/SPAN&gt;END_COMMAND_PART &lt;SPAN&gt;1361&lt;/SPAN&gt; answer &lt;SPAN&gt;=&lt;/SPAN&gt; &lt;SPAN&gt;self&lt;/SPAN&gt;&lt;SPAN&gt;.&lt;/SPAN&gt;gateway_client&lt;SPAN&gt;.&lt;/SPAN&gt;send_command(command) &lt;SPAN class=""&gt;-&amp;gt; 1362&lt;/SPAN&gt; return_value &lt;SPAN&gt;=&lt;/SPAN&gt; get_return_value( &lt;SPAN&gt;1363&lt;/SPAN&gt; answer, &lt;SPAN&gt;self&lt;/SPAN&gt;&lt;SPAN&gt;.&lt;/SPAN&gt;gateway_client, &lt;SPAN&gt;self&lt;/SPAN&gt;&lt;SPAN&gt;.&lt;/SPAN&gt;target_id, &lt;SPAN&gt;self&lt;/SPAN&gt;&lt;SPAN&gt;.&lt;/SPAN&gt;name) &lt;SPAN&gt;1365&lt;/SPAN&gt; &lt;SPAN class=""&gt;for&lt;/SPAN&gt; temp_arg &lt;SPAN class=""&gt;in&lt;/SPAN&gt; temp_args: &lt;SPAN&gt;1366&lt;/SPAN&gt; &lt;SPAN class=""&gt;if&lt;/SPAN&gt; &lt;SPAN&gt;hasattr&lt;/SPAN&gt;(temp_arg, &lt;SPAN&gt;"&lt;/SPAN&gt;&lt;SPAN&gt;_detach&lt;/SPAN&gt;&lt;SPAN&gt;"&lt;/SPAN&gt;&lt;span class="lia-unicode-emoji" title=":disappointed_face:"&gt;😞&lt;/span&gt;&lt;/DIV&gt;&lt;DIV class=""&gt;File &lt;SPAN class=""&gt;/databricks/spark/python/pyspark/errors/exceptions/captured.py:269&lt;/SPAN&gt;, in &lt;SPAN class=""&gt;capture_sql_exception.&amp;lt;locals&amp;gt;.deco&lt;/SPAN&gt;&lt;SPAN class=""&gt;(*a, **kw)&lt;/SPAN&gt; &lt;SPAN&gt;266&lt;/SPAN&gt; &lt;SPAN class=""&gt;from&lt;/SPAN&gt; &lt;SPAN class=""&gt;py4j&lt;/SPAN&gt;&lt;SPAN class=""&gt;.&lt;/SPAN&gt;&lt;SPAN class=""&gt;protocol&lt;/SPAN&gt; &lt;SPAN class=""&gt;import&lt;/SPAN&gt; Py4JJavaError &lt;SPAN&gt;268&lt;/SPAN&gt; &lt;SPAN class=""&gt;try&lt;/SPAN&gt;: &lt;SPAN class=""&gt;--&amp;gt; 269&lt;/SPAN&gt; &lt;SPAN class=""&gt;return&lt;/SPAN&gt; f(&lt;SPAN&gt;*&lt;/SPAN&gt;a, &lt;SPAN&gt;*&lt;/SPAN&gt;&lt;SPAN&gt;*&lt;/SPAN&gt;kw) &lt;SPAN&gt;270&lt;/SPAN&gt; &lt;SPAN class=""&gt;except&lt;/SPAN&gt; Py4JJavaError &lt;SPAN class=""&gt;as&lt;/SPAN&gt; e: &lt;SPAN&gt;271&lt;/SPAN&gt; converted &lt;SPAN&gt;=&lt;/SPAN&gt; convert_exception(e&lt;SPAN&gt;.&lt;/SPAN&gt;java_exception)&lt;/DIV&gt;&lt;DIV class=""&gt;File &lt;SPAN class=""&gt;/databricks/spark/python/lib/py4j-0.10.9.9-src.zip/py4j/protocol.py:327&lt;/SPAN&gt;, in &lt;SPAN class=""&gt;get_return_value&lt;/SPAN&gt;&lt;SPAN class=""&gt;(answer, gateway_client, target_id, name)&lt;/SPAN&gt; &lt;SPAN&gt;325&lt;/SPAN&gt; value &lt;SPAN&gt;=&lt;/SPAN&gt; OUTPUT_CONVERTER[&lt;SPAN&gt;type&lt;/SPAN&gt;](answer[&lt;SPAN&gt;2&lt;/SPAN&gt;:], gateway_client) &lt;SPAN&gt;326&lt;/SPAN&gt; &lt;SPAN class=""&gt;if&lt;/SPAN&gt; answer[&lt;SPAN&gt;1&lt;/SPAN&gt;] &lt;SPAN&gt;==&lt;/SPAN&gt; REFERENCE_TYPE: &lt;SPAN class=""&gt;--&amp;gt; 327&lt;/SPAN&gt; &lt;SPAN class=""&gt;raise&lt;/SPAN&gt; Py4JJavaError( &lt;SPAN&gt;328&lt;/SPAN&gt; &lt;SPAN&gt;"&lt;/SPAN&gt;&lt;SPAN&gt;An error occurred while calling &lt;/SPAN&gt;&lt;SPAN class=""&gt;{0}&lt;/SPAN&gt;&lt;SPAN class=""&gt;{1}&lt;/SPAN&gt;&lt;SPAN class=""&gt;{2}&lt;/SPAN&gt;&lt;SPAN&gt;.&lt;/SPAN&gt;&lt;SPAN class=""&gt;\n&lt;/SPAN&gt;&lt;SPAN&gt;"&lt;/SPAN&gt;&lt;SPAN&gt;.&lt;/SPAN&gt; &lt;SPAN&gt;329&lt;/SPAN&gt; &lt;SPAN&gt;format&lt;/SPAN&gt;(target_id, &lt;SPAN&gt;"&lt;/SPAN&gt;&lt;SPAN&gt;.&lt;/SPAN&gt;&lt;SPAN&gt;"&lt;/SPAN&gt;, name), value) &lt;SPAN&gt;330&lt;/SPAN&gt; &lt;SPAN class=""&gt;else&lt;/SPAN&gt;: &lt;SPAN&gt;331&lt;/SPAN&gt; &lt;SPAN class=""&gt;raise&lt;/SPAN&gt; Py4JError( &lt;SPAN&gt;332&lt;/SPAN&gt; &lt;SPAN&gt;"&lt;/SPAN&gt;&lt;SPAN&gt;An error occurred while calling &lt;/SPAN&gt;&lt;SPAN class=""&gt;{0}&lt;/SPAN&gt;&lt;SPAN class=""&gt;{1}&lt;/SPAN&gt;&lt;SPAN class=""&gt;{2}&lt;/SPAN&gt;&lt;SPAN&gt;. Trace:&lt;/SPAN&gt;&lt;SPAN class=""&gt;\n&lt;/SPAN&gt;&lt;SPAN class=""&gt;{3}&lt;/SPAN&gt;&lt;SPAN class=""&gt;\n&lt;/SPAN&gt;&lt;SPAN&gt;"&lt;/SPAN&gt;&lt;SPAN&gt;.&lt;/SPAN&gt; &lt;SPAN&gt;333&lt;/SPAN&gt; &lt;SPAN&gt;format&lt;/SPAN&gt;(target_id, &lt;SPAN&gt;"&lt;/SPAN&gt;&lt;SPAN&gt;.&lt;/SPAN&gt;&lt;SPAN&gt;"&lt;/SPAN&gt;, name, value))&lt;/DIV&gt;&lt;DIV class=""&gt;```&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;DIV class=""&gt;&lt;DIV class=""&gt;&lt;DIV class=""&gt;&amp;nbsp;&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;</description>
      <pubDate>Thu, 29 May 2025 17:26:57 GMT</pubDate>
      <guid>https://community.databricks.com/t5/machine-learning/distributed-sparkxgbranker-training-failed-barrier-resultstage/m-p/120569#M4094</guid>
      <dc:creator>spicysheep</dc:creator>
      <dc:date>2025-05-29T17:26:57Z</dc:date>
    </item>
    <item>
      <title>Re: Distributed SparkXGBRanker training: failed barrier ResultStage</title>
      <link>https://community.databricks.com/t5/machine-learning/distributed-sparkxgbranker-training-failed-barrier-resultstage/m-p/133760#M4338</link>
      <description>&lt;P&gt;Hey&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/111971"&gt;@spicysheep&lt;/a&gt;&amp;nbsp;,&amp;nbsp;&lt;/P&gt;
&lt;P&gt;I see you are getting&amp;nbsp;&lt;CODE&gt;failed barrier ResultStage&lt;/CODE&gt; error during distributed &lt;CODE&gt;SparkXGBRanker&lt;/CODE&gt; training.&lt;/P&gt;
&lt;P&gt;I believe it is a limitation using standard cluster, hence&lt;SPAN&gt;&amp;nbsp;distributed tasks are not working there.&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;Reference&amp;nbsp;&lt;A href="https://www.databricks.com/blog/2020/11/16/how-to-train-xgboost-with-spark.html," target="_blank"&gt;https://www.databricks.com/blog/2020/11/16/how-to-train-xgboost-with-spark.html,&lt;/A&gt;&amp;nbsp;it mentions to use GPU instance.&lt;/P&gt;
&lt;P&gt;I also see the limitation on&amp;nbsp;&lt;A href="https://docs.databricks.com/aws/en/compute/standard-limitations" target="_blank"&gt;https://docs.databricks.com/aws/en/compute/standard-limitations&lt;/A&gt;&lt;/P&gt;
&lt;P&gt;Thanks!&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Sat, 04 Oct 2025 04:09:24 GMT</pubDate>
      <guid>https://community.databricks.com/t5/machine-learning/distributed-sparkxgbranker-training-failed-barrier-resultstage/m-p/133760#M4338</guid>
      <dc:creator>NandiniN</dc:creator>
      <dc:date>2025-10-04T04:09:24Z</dc:date>
    </item>
    <item>
      <title>Re: Distributed SparkXGBRanker training: failed barrier ResultStage</title>
      <link>https://community.databricks.com/t5/machine-learning/distributed-sparkxgbranker-training-failed-barrier-resultstage/m-p/133761#M4339</link>
      <description>&lt;P&gt;You have already mentioned you did turn off&amp;nbsp;&lt;SPAN&gt;autoscaling, please try the&amp;nbsp;&lt;SPAN class="base"&gt;&lt;CODE&gt;num_workers&amp;nbsp;&lt;/CODE&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;too&lt;/P&gt;
&lt;P&gt;Step 1: Disable Dynamic Resource Allocation: Use&amp;nbsp;spark.dynamicAllocation.enabled = false&lt;/P&gt;
&lt;P&gt;Step 2: Configure &lt;CODE&gt;num_workers&lt;/CODE&gt; to Match Fixed Resources&lt;/P&gt;
&lt;P&gt;After disabling dynamic allocation, you must ensure that the number of workers requested by &lt;CODE&gt;SparkXGBRanker&lt;/CODE&gt; does not exceed the actual number of executor cores available, and ideally, should match them.&lt;/P&gt;
&lt;P&gt;&lt;SPAN class="base"&gt;&lt;SPAN class="mord text"&gt;&lt;SPAN class="mord"&gt;a. Total&amp;nbsp;Cores&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;SPAN class="mrel"&gt;=&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;SPAN class="base"&gt;&lt;SPAN class="mopen"&gt;(&lt;/SPAN&gt;&lt;SPAN class="mord text"&gt;&lt;SPAN class="mord"&gt;Number&amp;nbsp;of&amp;nbsp;Worker&amp;nbsp;Nodes&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;SPAN class="mclose"&gt;)&lt;/SPAN&gt;&lt;SPAN class="mbin"&gt;×&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;SPAN class="base"&gt;&lt;SPAN class="mopen"&gt;(&lt;/SPAN&gt;&lt;SPAN class="mord text"&gt;&lt;SPAN class="mord"&gt;Cores&amp;nbsp;per&amp;nbsp;Worker&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;SPAN class="mclose"&gt;)&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN class="base"&gt;b. Then, set the &lt;CODE&gt;num_workers&lt;/CODE&gt; parameter in your &lt;CODE&gt;SparkXGBRanker&lt;/CODE&gt; constructor to a value less than or equal to the total available cores.&lt;/SPAN&gt;&lt;/P&gt;
&lt;LI-CODE lang="markup"&gt;# Get the number of available cores (or set a fixed number of partitions)
num_partitions = data.rdd.getNumPartitions()

# Instantiate the ranker, setting num_workers equal to the number of partitions
# or the total cores on the cluster, ensuring it is a fixed number.
ranker = SparkXGBRanker(
    num_workers=num_partitions, # Or a number matching your cluster's total cores
    ... # other parameters
)

pipeline_model = pipeline.fit(data)&lt;/LI-CODE&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Sat, 04 Oct 2025 04:15:27 GMT</pubDate>
      <guid>https://community.databricks.com/t5/machine-learning/distributed-sparkxgbranker-training-failed-barrier-resultstage/m-p/133761#M4339</guid>
      <dc:creator>NandiniN</dc:creator>
      <dc:date>2025-10-04T04:15:27Z</dc:date>
    </item>
  </channel>
</rss>

