<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Why is GPU accelerated node much slower than CPU node for training a random forest model on databricks? in Machine Learning</title>
    <link>https://community.databricks.com/t5/machine-learning/why-is-gpu-accelerated-node-much-slower-than-cpu-node-for/m-p/27233#M1561</link>
    <description>&lt;P&gt;I have a dataset about 5 million rows with 14 features and a binary target. I decided to train a pyspark random forest classifier on Databricks. The CPU cluster I created contains 2 c4.8xlarge workers (60GB, 36core) and 1 r4.xlarge (31GB, 4core) driver. The GPU cluster I created contains 3 g4dn.4xlarge (64GB, 16cores) nodes, 2 as workers and 1 as driver. The hourly costs are very similar. I assumed that GPU cluster would outperform since random forest is an algorithm good for parallel computing, while the result kinda shocked me that the GPU cluster trained the model near 5 times slower than the CPU cluster. Is there anything I misunderstood about GPU acceleration or is it just not used for &lt;A href="https://pyspark.ml" alt="https://pyspark.ml" target="_blank"&gt;pyspark.ml&lt;/A&gt; modules?&lt;/P&gt;</description>
    <pubDate>Fri, 14 Oct 2022 17:07:02 GMT</pubDate>
    <dc:creator>zzy</dc:creator>
    <dc:date>2022-10-14T17:07:02Z</dc:date>
    <item>
      <title>Why is GPU accelerated node much slower than CPU node for training a random forest model on databricks?</title>
      <link>https://community.databricks.com/t5/machine-learning/why-is-gpu-accelerated-node-much-slower-than-cpu-node-for/m-p/27233#M1561</link>
      <description>&lt;P&gt;I have a dataset about 5 million rows with 14 features and a binary target. I decided to train a pyspark random forest classifier on Databricks. The CPU cluster I created contains 2 c4.8xlarge workers (60GB, 36core) and 1 r4.xlarge (31GB, 4core) driver. The GPU cluster I created contains 3 g4dn.4xlarge (64GB, 16cores) nodes, 2 as workers and 1 as driver. The hourly costs are very similar. I assumed that GPU cluster would outperform since random forest is an algorithm good for parallel computing, while the result kinda shocked me that the GPU cluster trained the model near 5 times slower than the CPU cluster. Is there anything I misunderstood about GPU acceleration or is it just not used for &lt;A href="https://pyspark.ml" alt="https://pyspark.ml" target="_blank"&gt;pyspark.ml&lt;/A&gt; modules?&lt;/P&gt;</description>
      <pubDate>Fri, 14 Oct 2022 17:07:02 GMT</pubDate>
      <guid>https://community.databricks.com/t5/machine-learning/why-is-gpu-accelerated-node-much-slower-than-cpu-node-for/m-p/27233#M1561</guid>
      <dc:creator>zzy</dc:creator>
      <dc:date>2022-10-14T17:07:02Z</dc:date>
    </item>
    <item>
      <title>Re: Why is GPU accelerated node much slower than CPU node for training a random forest model on databricks?</title>
      <link>https://community.databricks.com/t5/machine-learning/why-is-gpu-accelerated-node-much-slower-than-cpu-node-for/m-p/27235#M1563</link>
      <description>&lt;P&gt;In many cases, you need to adjust your code to utilize GPU. &lt;/P&gt;</description>
      <pubDate>Thu, 20 Oct 2022 12:40:36 GMT</pubDate>
      <guid>https://community.databricks.com/t5/machine-learning/why-is-gpu-accelerated-node-much-slower-than-cpu-node-for/m-p/27235#M1563</guid>
      <dc:creator>Hubert-Dudek</dc:creator>
      <dc:date>2022-10-20T12:40:36Z</dc:date>
    </item>
    <item>
      <title>Re: Why is GPU accelerated node much slower than CPU node for training a random forest model on databricks?</title>
      <link>https://community.databricks.com/t5/machine-learning/why-is-gpu-accelerated-node-much-slower-than-cpu-node-for/m-p/27234#M1562</link>
      <description>&lt;P&gt;Hi @Simon Zhang​&amp;nbsp;, could you please go through this: &lt;A href="https://www.databricks.com/session/gpu-support-in-spark-and-gpu-cpu-mixed-resource-scheduling-at-production-scale" alt="https://www.databricks.com/session/gpu-support-in-spark-and-gpu-cpu-mixed-resource-scheduling-at-production-scale" target="_blank"&gt;https://www.databricks.com/session/gpu-support-in-spark-and-gpu-cpu-mixed-resource-scheduling-at-production-scale&lt;/A&gt; and let us know if it addresses your concern?&lt;/P&gt;</description>
      <pubDate>Tue, 18 Oct 2022 12:56:40 GMT</pubDate>
      <guid>https://community.databricks.com/t5/machine-learning/why-is-gpu-accelerated-node-much-slower-than-cpu-node-for/m-p/27234#M1562</guid>
      <dc:creator>Debayan</dc:creator>
      <dc:date>2022-10-18T12:56:40Z</dc:date>
    </item>
  </channel>
</rss>

