<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Installing pyspark.pandas in Machine Learning</title>
    <link>https://community.databricks.com/t5/machine-learning/installing-pyspark-pandas/m-p/30177#M1626</link>
    <description>&lt;P&gt;Hello guys,&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;I'm trying to migrate a python project from Pandas to &lt;A href="https://spark.apache.org/docs/latest/api/python/user_guide/pandas_on_spark/index.html" alt="https://spark.apache.org/docs/latest/api/python/user_guide/pandas_on_spark/index.html" target="_blank"&gt;Pandas API on Spark&lt;/A&gt;, on Azure Databricks using MLFlow on a conda env.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;The thing is I'm getting the next error:&lt;/P&gt;&lt;P&gt;Traceback (most recent call last):&lt;/P&gt;&lt;P&gt;&amp;nbsp;File "/databricks/mlflow/projects/x/data_validation.py", line 13, in &amp;lt;module&amp;gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;import pyspark.pandas as ps&lt;/P&gt;&lt;P&gt;ModuleNotFoundError: No module named 'pyspark.pandas'&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Isn't the package supposed to be part of Spark already? We're using clusters on runtime version 10.4 LTS, which I understand is having Apache Spark 3.2.1, and I've seen that Pandas API on Spark should be included since 3.2&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;I also tried to install it from my config file, the one I use to create the conda env, but it's not working &lt;span class="lia-unicode-emoji" title=":disappointed_face:"&gt;😞&lt;/span&gt;&lt;/P&gt;</description>
    <pubDate>Thu, 29 Sep 2022 11:25:15 GMT</pubDate>
    <dc:creator>DiCamps</dc:creator>
    <dc:date>2022-09-29T11:25:15Z</dc:date>
    <item>
      <title>Installing pyspark.pandas</title>
      <link>https://community.databricks.com/t5/machine-learning/installing-pyspark-pandas/m-p/30177#M1626</link>
      <description>&lt;P&gt;Hello guys,&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;I'm trying to migrate a python project from Pandas to &lt;A href="https://spark.apache.org/docs/latest/api/python/user_guide/pandas_on_spark/index.html" alt="https://spark.apache.org/docs/latest/api/python/user_guide/pandas_on_spark/index.html" target="_blank"&gt;Pandas API on Spark&lt;/A&gt;, on Azure Databricks using MLFlow on a conda env.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;The thing is I'm getting the next error:&lt;/P&gt;&lt;P&gt;Traceback (most recent call last):&lt;/P&gt;&lt;P&gt;&amp;nbsp;File "/databricks/mlflow/projects/x/data_validation.py", line 13, in &amp;lt;module&amp;gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;import pyspark.pandas as ps&lt;/P&gt;&lt;P&gt;ModuleNotFoundError: No module named 'pyspark.pandas'&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Isn't the package supposed to be part of Spark already? We're using clusters on runtime version 10.4 LTS, which I understand is having Apache Spark 3.2.1, and I've seen that Pandas API on Spark should be included since 3.2&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;I also tried to install it from my config file, the one I use to create the conda env, but it's not working &lt;span class="lia-unicode-emoji" title=":disappointed_face:"&gt;😞&lt;/span&gt;&lt;/P&gt;</description>
      <pubDate>Thu, 29 Sep 2022 11:25:15 GMT</pubDate>
      <guid>https://community.databricks.com/t5/machine-learning/installing-pyspark-pandas/m-p/30177#M1626</guid>
      <dc:creator>DiCamps</dc:creator>
      <dc:date>2022-09-29T11:25:15Z</dc:date>
    </item>
    <item>
      <title>Re: Installing pyspark.pandas</title>
      <link>https://community.databricks.com/t5/machine-learning/installing-pyspark-pandas/m-p/30178#M1627</link>
      <description>&lt;P&gt;it should be yes.&lt;/P&gt;&lt;P&gt;can you elaborate on how you create your notebook (and the conda env you talk about)?&lt;/P&gt;</description>
      <pubDate>Thu, 29 Sep 2022 12:40:36 GMT</pubDate>
      <guid>https://community.databricks.com/t5/machine-learning/installing-pyspark-pandas/m-p/30178#M1627</guid>
      <dc:creator>-werners-</dc:creator>
      <dc:date>2022-09-29T12:40:36Z</dc:date>
    </item>
  </channel>
</rss>

