<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Mismatch cuda/cudnn version on Databricks Runtime GPU ML version in Administration &amp; Architecture</title>
    <link>https://community.databricks.com/t5/administration-architecture/mismatch-cuda-cudnn-version-on-databricks-runtime-gpu-ml-version/m-p/133105#M4106</link>
    <description>&lt;P&gt;There could be library related conflicts in 16.0ML that got fixed in 16.4ML. I would always recommend to use the LTS version. Thanks&lt;/P&gt;</description>
    <pubDate>Fri, 26 Sep 2025 18:19:26 GMT</pubDate>
    <dc:creator>lin-yuan</dc:creator>
    <dc:date>2025-09-26T18:19:26Z</dc:date>
    <item>
      <title>Mismatch cuda/cudnn version on Databricks Runtime GPU ML version</title>
      <link>https://community.databricks.com/t5/administration-architecture/mismatch-cuda-cudnn-version-on-databricks-runtime-gpu-ml-version/m-p/116599#M3290</link>
      <description>&lt;P&gt;I have a cluster on Databricks with configuration&amp;nbsp;&lt;SPAN class=""&gt;Databricks Runtime Version&lt;/SPAN&gt;&lt;SPAN class=""&gt;16.4 LTS ML Beta (includes Apache Spark 3.5.2, GPU, Scala 2.12), and another cluster with configuration 16.0 ML&amp;nbsp; (includes Apache Spark 3.5.2, GPU, Scala 2.12).&amp;nbsp; According to the documentation here (&lt;A href="https://learn.microsoft.com/en-gb/azure/databricks/release-notes/runtime/16.4lts-ml" target="_blank" rel="noopener"&gt;https://learn.microsoft.com/en-gb/azure/databricks/release-notes/runtime/16.4lts-ml&lt;/A&gt;) the GPU cluster has the following libraries installed:&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;&lt;UL&gt;&lt;LI&gt;CUDA 12.6&lt;/LI&gt;&lt;LI&gt;cublas 12.6.0.22-1&lt;/LI&gt;&lt;LI&gt;cusolver 11.6.4.38-1&lt;/LI&gt;&lt;LI&gt;cupti 12.6.37-1&lt;/LI&gt;&lt;LI&gt;cusparse 12.5.2.23-1&lt;/LI&gt;&lt;LI&gt;cuDNN 9.3.0.75-1&lt;/LI&gt;&lt;LI&gt;NCCL 2.22.3&lt;/LI&gt;&lt;LI&gt;TensorRT 10.2.0.19-1&lt;/LI&gt;&lt;/UL&gt;&lt;P&gt;The documentation for the 16.0 ML also has the same libraries installed.&lt;/P&gt;&lt;P&gt;However both of the clusters when I print the cuda/cudnn version it both returned a lower version:&amp;nbsp;&lt;/P&gt;&lt;P&gt;```&lt;/P&gt;&lt;DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;import&lt;/SPAN&gt;&lt;SPAN&gt; torch&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;print&lt;/SPAN&gt;&lt;SPAN&gt;(&lt;/SPAN&gt;&lt;SPAN&gt;'CUDA:'&lt;/SPAN&gt;&lt;SPAN&gt;,torch.version.cuda)&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;cudnn &lt;/SPAN&gt;&lt;SPAN&gt;=&lt;/SPAN&gt;&lt;SPAN&gt; torch.backends.cudnn.&lt;/SPAN&gt;&lt;SPAN&gt;version&lt;/SPAN&gt;&lt;SPAN&gt;()&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;cudnn_major &lt;/SPAN&gt;&lt;SPAN&gt;=&lt;/SPAN&gt;&lt;SPAN&gt; cudnn &lt;/SPAN&gt;&lt;SPAN&gt;//&lt;/SPAN&gt; &lt;SPAN&gt;10000&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;cudnn &lt;/SPAN&gt;&lt;SPAN&gt;=&lt;/SPAN&gt;&lt;SPAN&gt; cudnn &lt;/SPAN&gt;&lt;SPAN&gt;%&lt;/SPAN&gt; &lt;SPAN&gt;1000&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;cudnn_minor &lt;/SPAN&gt;&lt;SPAN&gt;=&lt;/SPAN&gt;&lt;SPAN&gt; cudnn &lt;/SPAN&gt;&lt;SPAN&gt;//&lt;/SPAN&gt; &lt;SPAN&gt;100&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;cudnn_patch &lt;/SPAN&gt;&lt;SPAN&gt;=&lt;/SPAN&gt;&lt;SPAN&gt; cudnn &lt;/SPAN&gt;&lt;SPAN&gt;%&lt;/SPAN&gt; &lt;SPAN&gt;100&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;print&lt;/SPAN&gt;&lt;SPAN&gt;( &lt;/SPAN&gt;&lt;SPAN&gt;'cuDNN:'&lt;/SPAN&gt;&lt;SPAN&gt;, &lt;/SPAN&gt;&lt;SPAN&gt;'.'&lt;/SPAN&gt;&lt;SPAN&gt;.&lt;/SPAN&gt;&lt;SPAN&gt;join&lt;/SPAN&gt;&lt;SPAN&gt;([&lt;/SPAN&gt;&lt;SPAN&gt;str&lt;/SPAN&gt;&lt;SPAN&gt;(cudnn_major),&lt;/SPAN&gt;&lt;SPAN&gt;str&lt;/SPAN&gt;&lt;SPAN&gt;(cudnn_minor),&lt;/SPAN&gt;&lt;SPAN&gt;str&lt;/SPAN&gt;&lt;SPAN&gt;(cudnn_patch)]) )&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;```&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&amp;nbsp;&lt;/DIV&gt;&lt;DIV&gt;Output:&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;CUDA: 12.4 &lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;cuDNN: 9.1.0&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&amp;nbsp;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;Further, when I run a tensorflow model training pipeline, the 16.4 LTS ML cluster runs without error, however the 16.0 ML cluster returns the following error:&lt;BR /&gt;Epoch 1/40 WARNING: All log messages before absl::InitializeLog() is called are written to STDERR I0000 00:00:1744832380.305571 2695 service.cc:148] XLA service 0x7f9138003620 initialized for platform CUDA (this does not guarantee that XLA will be used). Devices: I0000 00:00:1744832380.305600 2695 service.cc:156] StreamExecutor device (0): Tesla T4, Compute Capability 7.5 2025-04-16 19:39:42.151334: I tensorflow/compiler/mlir/tensorflow/utils/dump_mlir_util.cc:268] disabling MLIR crash reproducer, set env var MLIR_CRASH_REPRODUCER_DIRECTORY to enable. E0000 00:00:1744832387.521437 2695 cuda_dnn.cc:522] Loaded runtime CuDNN library: 9.1.0 but source was compiled with: 9.3.0. CuDNN library needs to have matching major version and equal or higher minor version. If using a binary install, upgrade your CuDNN library. If building from sources, make sure the library loaded at runtime is compatible with the version specified during compile configuration. E0000 00:00:1744832390.408174 2695 cuda_dnn.cc:522] &lt;STRONG&gt;Loaded runtime CuDNN library: 9.1.0 but source was compiled with: 9.3.0.&lt;/STRONG&gt; CuDNN library needs to have matching major version and equal or higher minor version. If using a binary install, upgrade your CuDNN library. If building from sources, make sure the library loaded at runtime is compatible with the version specified during compile configuration.&amp;nbsp;&lt;BR /&gt;&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&amp;nbsp;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;Please let me know why this situation happens, and how to avoid it in the future. Thanks!&lt;/SPAN&gt;&lt;/DIV&gt;&lt;/DIV&gt;</description>
      <pubDate>Fri, 25 Apr 2025 15:52:22 GMT</pubDate>
      <guid>https://community.databricks.com/t5/administration-architecture/mismatch-cuda-cudnn-version-on-databricks-runtime-gpu-ml-version/m-p/116599#M3290</guid>
      <dc:creator>chloe_nm</dc:creator>
      <dc:date>2025-04-25T15:52:22Z</dc:date>
    </item>
    <item>
      <title>Re: Mismatch cuda/cudnn version on Databricks Runtime GPU ML version</title>
      <link>https://community.databricks.com/t5/administration-architecture/mismatch-cuda-cudnn-version-on-databricks-runtime-gpu-ml-version/m-p/133105#M4106</link>
      <description>&lt;P&gt;There could be library related conflicts in 16.0ML that got fixed in 16.4ML. I would always recommend to use the LTS version. Thanks&lt;/P&gt;</description>
      <pubDate>Fri, 26 Sep 2025 18:19:26 GMT</pubDate>
      <guid>https://community.databricks.com/t5/administration-architecture/mismatch-cuda-cudnn-version-on-databricks-runtime-gpu-ml-version/m-p/133105#M4106</guid>
      <dc:creator>lin-yuan</dc:creator>
      <dc:date>2025-09-26T18:19:26Z</dc:date>
    </item>
  </channel>
</rss>

