<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Training Job Failure (Driver Error) in Machine Learning</title>
    <link>https://community.databricks.com/t5/machine-learning/training-job-failure-driver-error/m-p/100469#M3812</link>
    <description>&lt;P&gt;In our case, we needed to correct our dependent libraries. We had an incorrect path referenced.&lt;/P&gt;</description>
    <pubDate>Fri, 29 Nov 2024 21:14:26 GMT</pubDate>
    <dc:creator>jonathanhodges</dc:creator>
    <dc:date>2024-11-29T21:14:26Z</dc:date>
    <item>
      <title>Training Job Failure (Driver Error)</title>
      <link>https://community.databricks.com/t5/machine-learning/training-job-failure-driver-error/m-p/96895#M3752</link>
      <description>&lt;P&gt;&lt;SPAN&gt;We have a new model training job that was running fine for a few days and then started failing. I have attached images for more details.&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;I am wondering if 'can't reach driver cluster' is a red herring. It says the driver is healthy right before execution&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;When I look into the logs, it looks like a library problem potentially with numpy.&lt;BR /&gt;&lt;BR /&gt;ValueError: numpy.dtype size changed, may indicate binary incompatibility. Expected 96 from C header, got 88 from PyObject&lt;BR /&gt;Traceback (most recent call last):&lt;BR /&gt;from pandas._libs.interval import Interval&lt;BR /&gt;File "pandas/_libs/interval.pyx", line 1, in init pandas._libs.interval&lt;BR /&gt;&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;Has anyone seen this before and have any ideas or suggestions?&lt;/SPAN&gt;&lt;/P&gt;</description>
      <pubDate>Thu, 31 Oct 2024 00:50:12 GMT</pubDate>
      <guid>https://community.databricks.com/t5/machine-learning/training-job-failure-driver-error/m-p/96895#M3752</guid>
      <dc:creator>jonathanhodges</dc:creator>
      <dc:date>2024-10-31T00:50:12Z</dc:date>
    </item>
    <item>
      <title>Re: Training Job Failure (Driver Error)</title>
      <link>https://community.databricks.com/t5/machine-learning/training-job-failure-driver-error/m-p/96896#M3753</link>
      <description>&lt;P&gt;Sorry here is the full stack trace and one additional screen shot:&lt;/P&gt;&lt;P&gt;ValueError: numpy.dtype size changed, may indicate binary incompatibility. Expected 96 from C header, got 88 from PyObject&lt;BR /&gt;Traceback (most recent call last):&lt;BR /&gt;File "/databricks/python_shell/scripts/db_ipykernel_launcher.py", line 37, in &amp;lt;module&amp;gt;&lt;BR /&gt;from dbruntime.PipMagicOverrides import PipMagicOverrides&lt;BR /&gt;File "/databricks/python_shell/dbruntime/PipMagicOverrides.py", line 8, in &amp;lt;module&amp;gt;&lt;BR /&gt;from pyspark.sql.connect.session import SparkSession as RemoteSparkSession&lt;BR /&gt;File "/databricks/spark/python/pyspark/sql/connect/session.py", line 19, in &amp;lt;module&amp;gt;&lt;BR /&gt;check_dependencies(__name__)&lt;BR /&gt;File "/databricks/spark/python/pyspark/sql/connect/utils.py", line 33, in check_dependencies&lt;BR /&gt;require_minimum_pandas_version()&lt;BR /&gt;File "/databricks/spark/python/pyspark/sql/pandas/utils.py", line 27, in require_minimum_pandas_version&lt;BR /&gt;import pandas&lt;BR /&gt;File "/databricks/python/lib/python3.10/site-packages/pandas/__init__.py", line 22, in &amp;lt;module&amp;gt;&lt;BR /&gt;from pandas.compat import is_numpy_dev as _is_numpy_dev&lt;BR /&gt;File "/databricks/python/lib/python3.10/site-packages/pandas/compat/__init__.py", line 15, in &amp;lt;module&amp;gt;&lt;BR /&gt;from pandas.compat.numpy import (&lt;BR /&gt;File "/databricks/python/lib/python3.10/site-packages/pandas/compat/numpy/__init__.py", line 4, in &amp;lt;module&amp;gt;&lt;BR /&gt;from pandas.util.version import Version&lt;BR /&gt;File "/databricks/python/lib/python3.10/site-packages/pandas/util/__init__.py", line 1, in &amp;lt;module&amp;gt;&lt;BR /&gt;from pandas.util._decorators import ( # noqa:F401&lt;BR /&gt;File "/databricks/python/lib/python3.10/site-packages/pandas/util/_decorators.py", line 14, in &amp;lt;module&amp;gt;&lt;BR /&gt;from pandas._libs.properties import cache_readonly # noqa:F401&lt;BR /&gt;File "/databricks/python/lib/python3.10/site-packages/pandas/_libs/__init__.py", line 13, in &amp;lt;module&amp;gt;&lt;BR /&gt;from pandas._libs.interval import Interval&lt;BR /&gt;File "pandas/_libs/interval.pyx", line 1, in init pandas._libs.interval&lt;/P&gt;</description>
      <pubDate>Thu, 31 Oct 2024 00:51:18 GMT</pubDate>
      <guid>https://community.databricks.com/t5/machine-learning/training-job-failure-driver-error/m-p/96896#M3753</guid>
      <dc:creator>jonathanhodges</dc:creator>
      <dc:date>2024-10-31T00:51:18Z</dc:date>
    </item>
    <item>
      <title>Re: Training Job Failure (Driver Error)</title>
      <link>https://community.databricks.com/t5/machine-learning/training-job-failure-driver-error/m-p/100463#M3811</link>
      <description>&lt;P&gt;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/130534"&gt;@jonathanhodges&lt;/a&gt;&amp;nbsp;By any chance did you manage to solve this issue?&lt;/P&gt;&lt;P&gt;We are also having same issue&lt;/P&gt;</description>
      <pubDate>Fri, 29 Nov 2024 19:34:02 GMT</pubDate>
      <guid>https://community.databricks.com/t5/machine-learning/training-job-failure-driver-error/m-p/100463#M3811</guid>
      <dc:creator>Charuvil</dc:creator>
      <dc:date>2024-11-29T19:34:02Z</dc:date>
    </item>
    <item>
      <title>Re: Training Job Failure (Driver Error)</title>
      <link>https://community.databricks.com/t5/machine-learning/training-job-failure-driver-error/m-p/100469#M3812</link>
      <description>&lt;P&gt;In our case, we needed to correct our dependent libraries. We had an incorrect path referenced.&lt;/P&gt;</description>
      <pubDate>Fri, 29 Nov 2024 21:14:26 GMT</pubDate>
      <guid>https://community.databricks.com/t5/machine-learning/training-job-failure-driver-error/m-p/100469#M3812</guid>
      <dc:creator>jonathanhodges</dc:creator>
      <dc:date>2024-11-29T21:14:26Z</dc:date>
    </item>
    <item>
      <title>Re: Training Job Failure (Driver Error)</title>
      <link>https://community.databricks.com/t5/machine-learning/training-job-failure-driver-error/m-p/100499#M3814</link>
      <description>&lt;P&gt;Same in my case as well. It was related to Numpy version 2. Reducing the version fixed the issue. Error messages from Databricks were pretty confusing.&lt;/P&gt;</description>
      <pubDate>Sat, 30 Nov 2024 14:27:15 GMT</pubDate>
      <guid>https://community.databricks.com/t5/machine-learning/training-job-failure-driver-error/m-p/100499#M3814</guid>
      <dc:creator>Charuvil</dc:creator>
      <dc:date>2024-11-30T14:27:15Z</dc:date>
    </item>
  </channel>
</rss>

