<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: I have a single node XGBoost model written in Python. How can I scale it with Spark? in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/i-have-a-single-node-xgboost-model-written-in-python-how-can-i/m-p/25298#M17578</link>
    <description>&lt;P&gt;If you are talking about distributed training of a single XGBoost model, there is no built-in capability in SparkML. SparkML supports&amp;nbsp;&lt;A href="https://spark.apache.org/docs/latest/api/python/reference/api/pyspark.ml.classification.GBTClassifier.html?highlight=gbt#pyspark.ml.classification.GBTClassifier" alt="https://spark.apache.org/docs/latest/api/python/reference/api/pyspark.ml.classification.GBTClassifier.html?highlight=gbt#pyspark.ml.classification.GBTClassifier" target="_blank"&gt;gradient boosted trees&lt;/A&gt;, but not XGBoost specifically. However, there are 3rd party packages, such as&amp;nbsp;&lt;A href="https://xgboost.readthedocs.io/en/latest/jvm/xgboost4j_spark_tutorial.html" alt="https://xgboost.readthedocs.io/en/latest/jvm/xgboost4j_spark_tutorial.html" target="_blank"&gt;XGBoost4J&lt;/A&gt;&amp;nbsp;that you can use. Currently, there is no Python API for it, but you can access it via Scala/Java. See the Databricks&amp;nbsp;&lt;A href="https://docs.databricks.com/applications/machine-learning/train-model/xgboost.html" alt="https://docs.databricks.com/applications/machine-learning/train-model/xgboost.html" target="_blank"&gt;docs&lt;/A&gt;&amp;nbsp;for a more complete example. &lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;If you want to scale the hyperparameter tuning, you can use HyperOpt with single node XGBoost models in Python, or you can always do distributed inference via a Spark UDF.&lt;/P&gt;</description>
    <pubDate>Thu, 10 Jun 2021 18:41:45 GMT</pubDate>
    <dc:creator>j_weaver</dc:creator>
    <dc:date>2021-06-10T18:41:45Z</dc:date>
    <item>
      <title>I have a single node XGBoost model written in Python. How can I scale it with Spark?</title>
      <link>https://community.databricks.com/t5/data-engineering/i-have-a-single-node-xgboost-model-written-in-python-how-can-i/m-p/25297#M17577</link>
      <description />
      <pubDate>Thu, 10 Jun 2021 17:53:28 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/i-have-a-single-node-xgboost-model-written-in-python-how-can-i/m-p/25297#M17577</guid>
      <dc:creator>User16788317454</dc:creator>
      <dc:date>2021-06-10T17:53:28Z</dc:date>
    </item>
    <item>
      <title>Re: I have a single node XGBoost model written in Python. How can I scale it with Spark?</title>
      <link>https://community.databricks.com/t5/data-engineering/i-have-a-single-node-xgboost-model-written-in-python-how-can-i/m-p/25298#M17578</link>
      <description>&lt;P&gt;If you are talking about distributed training of a single XGBoost model, there is no built-in capability in SparkML. SparkML supports&amp;nbsp;&lt;A href="https://spark.apache.org/docs/latest/api/python/reference/api/pyspark.ml.classification.GBTClassifier.html?highlight=gbt#pyspark.ml.classification.GBTClassifier" alt="https://spark.apache.org/docs/latest/api/python/reference/api/pyspark.ml.classification.GBTClassifier.html?highlight=gbt#pyspark.ml.classification.GBTClassifier" target="_blank"&gt;gradient boosted trees&lt;/A&gt;, but not XGBoost specifically. However, there are 3rd party packages, such as&amp;nbsp;&lt;A href="https://xgboost.readthedocs.io/en/latest/jvm/xgboost4j_spark_tutorial.html" alt="https://xgboost.readthedocs.io/en/latest/jvm/xgboost4j_spark_tutorial.html" target="_blank"&gt;XGBoost4J&lt;/A&gt;&amp;nbsp;that you can use. Currently, there is no Python API for it, but you can access it via Scala/Java. See the Databricks&amp;nbsp;&lt;A href="https://docs.databricks.com/applications/machine-learning/train-model/xgboost.html" alt="https://docs.databricks.com/applications/machine-learning/train-model/xgboost.html" target="_blank"&gt;docs&lt;/A&gt;&amp;nbsp;for a more complete example. &lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;If you want to scale the hyperparameter tuning, you can use HyperOpt with single node XGBoost models in Python, or you can always do distributed inference via a Spark UDF.&lt;/P&gt;</description>
      <pubDate>Thu, 10 Jun 2021 18:41:45 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/i-have-a-single-node-xgboost-model-written-in-python-how-can-i/m-p/25298#M17578</guid>
      <dc:creator>j_weaver</dc:creator>
      <dc:date>2021-06-10T18:41:45Z</dc:date>
    </item>
  </channel>
</rss>

