<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: What is the proper way to import the new pyspark.pandas library? in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/what-is-the-proper-way-to-import-the-new-pyspark-pandas-library/m-p/12303#M7124</link>
    <description>&lt;P&gt;Official issues is the best way to go.  That way you can reference them in the future and show your work off!&lt;/P&gt;</description>
    <pubDate>Sat, 30 Oct 2021 23:32:45 GMT</pubDate>
    <dc:creator>Anonymous</dc:creator>
    <dc:date>2021-10-30T23:32:45Z</dc:date>
    <item>
      <title>What is the proper way to import the new pyspark.pandas library?</title>
      <link>https://community.databricks.com/t5/data-engineering/what-is-the-proper-way-to-import-the-new-pyspark-pandas-library/m-p/12293#M7114</link>
      <description>&lt;P&gt;I am moving an existing, working pandas program into Databricks. I want to use the new pyspark.pandas library, and change my code as little as possible. It appears that I should do the following:&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;1) Add&lt;B&gt; from pyspark import pandas as ps&lt;/B&gt; at the top&lt;/P&gt;&lt;P&gt;2) Change all occurrences of &lt;B&gt;pd.pandas_function&lt;/B&gt; to &lt;B&gt;ps.pandas_function&lt;/B&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Is this correct?&lt;/P&gt;&lt;P&gt;&lt;/P&gt;</description>
      <pubDate>Wed, 27 Oct 2021 17:00:17 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/what-is-the-proper-way-to-import-the-new-pyspark-pandas-library/m-p/12293#M7114</guid>
      <dc:creator>cconnell</dc:creator>
      <dc:date>2021-10-27T17:00:17Z</dc:date>
    </item>
    <item>
      <title>Re: What is the proper way to import the new pyspark.pandas library?</title>
      <link>https://community.databricks.com/t5/data-engineering/what-is-the-proper-way-to-import-the-new-pyspark-pandas-library/m-p/12294#M7115</link>
      <description>&lt;P&gt;Yes- that is a great start. Right now code coverage is at 83%, and we are shooting for 90%. Please &lt;A href="https://issues.apache.org/jira/projects/SPARK" alt="https://issues.apache.org/jira/projects/SPARK" target="_blank"&gt;file an issue&lt;/A&gt;&amp;nbsp;if you find a gap you need filled. Please note- pandas on Spark (koalas) does some interesting things with the way it distributes the indexes of large dataframes. It may good to review starting at 20:20 here: &lt;A href="https://databricks.com/session_na20/koalas-making-an-easy-transition-from-pandas-to-apache-spark" target="test_blank"&gt;https://databricks.com/session_na20/koalas-making-an-easy-transition-from-pandas-to-apache-spark&lt;/A&gt;&lt;/P&gt;</description>
      <pubDate>Thu, 28 Oct 2021 03:21:55 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/what-is-the-proper-way-to-import-the-new-pyspark-pandas-library/m-p/12294#M7115</guid>
      <dc:creator>Dan_Z</dc:creator>
      <dc:date>2021-10-28T03:21:55Z</dc:date>
    </item>
    <item>
      <title>Re: What is the proper way to import the new pyspark.pandas library?</title>
      <link>https://community.databricks.com/t5/data-engineering/what-is-the-proper-way-to-import-the-new-pyspark-pandas-library/m-p/12295#M7116</link>
      <description>&lt;P&gt;&lt;B&gt;import pyspark.pandas as ps &lt;/B&gt;but as code coverage is 83% as @Dan Zafar​&amp;nbsp;said it can not guarantee that your old code will work. You can find more here on that blogpost &lt;A href="https://databricks.com/blog/2021/10/04/pandas-api-on-upcoming-apache-spark-3-2.html" target="test_blank"&gt;https://databricks.com/blog/2021/10/04/pandas-api-on-upcoming-apache-spark-3-2.html&lt;/A&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;</description>
      <pubDate>Thu, 28 Oct 2021 09:28:31 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/what-is-the-proper-way-to-import-the-new-pyspark-pandas-library/m-p/12295#M7116</guid>
      <dc:creator>Hubert-Dudek</dc:creator>
      <dc:date>2021-10-28T09:28:31Z</dc:date>
    </item>
    <item>
      <title>Re: What is the proper way to import the new pyspark.pandas library?</title>
      <link>https://community.databricks.com/t5/data-engineering/what-is-the-proper-way-to-import-the-new-pyspark-pandas-library/m-p/12296#M7117</link>
      <description>&lt;P&gt;Thank you. I am porting this code and will file issues as I see them… &lt;A href="https://medium.com/@chuck.connell.3/vaccines-vs-mortality-correlation-at-the-county-level-922a10236a93" alt="https://medium.com/@chuck.connell.3/vaccines-vs-mortality-correlation-at-the-county-level-922a10236a93" target="_blank"&gt;https://medium.com/@chuck.connell.3/vaccines-vs-mortality-correlation-at-the-county-level-922a10236a93&lt;/A&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;</description>
      <pubDate>Thu, 28 Oct 2021 13:38:25 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/what-is-the-proper-way-to-import-the-new-pyspark-pandas-library/m-p/12296#M7117</guid>
      <dc:creator>cconnell</dc:creator>
      <dc:date>2021-10-28T13:38:25Z</dc:date>
    </item>
    <item>
      <title>Re: What is the proper way to import the new pyspark.pandas library?</title>
      <link>https://community.databricks.com/t5/data-engineering/what-is-the-proper-way-to-import-the-new-pyspark-pandas-library/m-p/12297#M7118</link>
      <description>&lt;P&gt;I have read that blog. It was helpful but implied that I can move code from laptop pandas to spark pandas by changing &lt;I&gt;one line &lt;/I&gt;of code, which does not seem true.&lt;/P&gt;</description>
      <pubDate>Thu, 28 Oct 2021 13:40:58 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/what-is-the-proper-way-to-import-the-new-pyspark-pandas-library/m-p/12297#M7118</guid>
      <dc:creator>cconnell</dc:creator>
      <dc:date>2021-10-28T13:40:58Z</dc:date>
    </item>
    <item>
      <title>Re: What is the proper way to import the new pyspark.pandas library?</title>
      <link>https://community.databricks.com/t5/data-engineering/what-is-the-proper-way-to-import-the-new-pyspark-pandas-library/m-p/12298#M7119</link>
      <description>&lt;P&gt;That''s right, expect some minor refactoring. &lt;/P&gt;</description>
      <pubDate>Thu, 28 Oct 2021 15:24:29 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/what-is-the-proper-way-to-import-the-new-pyspark-pandas-library/m-p/12298#M7119</guid>
      <dc:creator>Dan_Z</dc:creator>
      <dc:date>2021-10-28T15:24:29Z</dc:date>
    </item>
    <item>
      <title>Re: What is the proper way to import the new pyspark.pandas library?</title>
      <link>https://community.databricks.com/t5/data-engineering/what-is-the-proper-way-to-import-the-new-pyspark-pandas-library/m-p/12299#M7120</link>
      <description>&lt;P&gt;Make sure to use the 10.0 Runtime which includes Spark 3.2&lt;/P&gt;</description>
      <pubDate>Sat, 30 Oct 2021 10:04:31 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/what-is-the-proper-way-to-import-the-new-pyspark-pandas-library/m-p/12299#M7120</guid>
      <dc:creator>Anonymous</dc:creator>
      <dc:date>2021-10-30T10:04:31Z</dc:date>
    </item>
    <item>
      <title>Re: What is the proper way to import the new pyspark.pandas library?</title>
      <link>https://community.databricks.com/t5/data-engineering/what-is-the-proper-way-to-import-the-new-pyspark-pandas-library/m-p/12300#M7121</link>
      <description>&lt;P&gt;Yes, did that. I am now porting the code, finding workarounds to problems, and keeping a list of issues. Will write it all up as an article on Medium.&lt;/P&gt;</description>
      <pubDate>Sat, 30 Oct 2021 12:40:28 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/what-is-the-proper-way-to-import-the-new-pyspark-pandas-library/m-p/12300#M7121</guid>
      <dc:creator>cconnell</dc:creator>
      <dc:date>2021-10-30T12:40:28Z</dc:date>
    </item>
    <item>
      <title>Re: What is the proper way to import the new pyspark.pandas library?</title>
      <link>https://community.databricks.com/t5/data-engineering/what-is-the-proper-way-to-import-the-new-pyspark-pandas-library/m-p/12301#M7122</link>
      <description>&lt;P&gt;Cool, make sure to link it here when you're finished.  &lt;/P&gt;</description>
      <pubDate>Sat, 30 Oct 2021 21:00:52 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/what-is-the-proper-way-to-import-the-new-pyspark-pandas-library/m-p/12301#M7122</guid>
      <dc:creator>Anonymous</dc:creator>
      <dc:date>2021-10-30T21:00:52Z</dc:date>
    </item>
    <item>
      <title>Re: What is the proper way to import the new pyspark.pandas library?</title>
      <link>https://community.databricks.com/t5/data-engineering/what-is-the-proper-way-to-import-the-new-pyspark-pandas-library/m-p/12302#M7123</link>
      <description>&lt;P&gt;Yes. Do you want me to create official issues at apache/spark, or let someone on your team do it from my notes?&lt;/P&gt;</description>
      <pubDate>Sat, 30 Oct 2021 22:30:25 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/what-is-the-proper-way-to-import-the-new-pyspark-pandas-library/m-p/12302#M7123</guid>
      <dc:creator>cconnell</dc:creator>
      <dc:date>2021-10-30T22:30:25Z</dc:date>
    </item>
    <item>
      <title>Re: What is the proper way to import the new pyspark.pandas library?</title>
      <link>https://community.databricks.com/t5/data-engineering/what-is-the-proper-way-to-import-the-new-pyspark-pandas-library/m-p/12303#M7124</link>
      <description>&lt;P&gt;Official issues is the best way to go.  That way you can reference them in the future and show your work off!&lt;/P&gt;</description>
      <pubDate>Sat, 30 Oct 2021 23:32:45 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/what-is-the-proper-way-to-import-the-new-pyspark-pandas-library/m-p/12303#M7124</guid>
      <dc:creator>Anonymous</dc:creator>
      <dc:date>2021-10-30T23:32:45Z</dc:date>
    </item>
    <item>
      <title>Re: What is the proper way to import the new pyspark.pandas library?</title>
      <link>https://community.databricks.com/t5/data-engineering/what-is-the-proper-way-to-import-the-new-pyspark-pandas-library/m-p/12304#M7125</link>
      <description>&lt;P&gt;I created some issues, such as this one. Please let me know if I did anything wrong, so I can fix them. Thanks!&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;A href="https://issues.apache.org/jira/browse/SPARK-37180" alt="https://issues.apache.org/jira/browse/SPARK-37180" target="_blank"&gt;https://issues.apache.org/jira/browse/SPARK-37180&lt;/A&gt;&lt;/P&gt;</description>
      <pubDate>Mon, 01 Nov 2021 15:01:44 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/what-is-the-proper-way-to-import-the-new-pyspark-pandas-library/m-p/12304#M7125</guid>
      <dc:creator>cconnell</dc:creator>
      <dc:date>2021-11-01T15:01:44Z</dc:date>
    </item>
  </channel>
</rss>

