<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: How to force pandas_on_spark plots to use all dataframe data? in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/how-to-force-pandas-on-spark-plots-to-use-all-dataframe-data/m-p/28698#M20475</link>
    <description>&lt;P&gt;Hello, @Davide Cagnoni​&amp;nbsp;- It's nice to meet you! My name is Piper, and I'm a moderator for the community. Thank you for bringing this question to us. Let's give your peers a chance to respond and we'll come back if we need to.&lt;/P&gt;</description>
    <pubDate>Fri, 11 Feb 2022 16:22:34 GMT</pubDate>
    <dc:creator>Anonymous</dc:creator>
    <dc:date>2022-02-11T16:22:34Z</dc:date>
    <item>
      <title>How to force pandas_on_spark plots to use all dataframe data?</title>
      <link>https://community.databricks.com/t5/data-engineering/how-to-force-pandas-on-spark-plots-to-use-all-dataframe-data/m-p/28697#M20474</link>
      <description>&lt;P&gt;When I load a table as a `pandas_on_spark` dataframe, and try to e.g. scatterplot two columns, what I obtain is a subset of the desired points.  &lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;For example, if I try to plot two columns from a table with 1000000 rows, I only see some of the data - it looks like the first 1000, but maybe I am swayed from the spark dataframe behavior with the  `display` function which states to be using only the first 1000 rows if the table has more.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Is it possible to either force the plot to show all the data, or to at least know how much data out of the total amount is being plot? &lt;/P&gt;</description>
      <pubDate>Fri, 11 Feb 2022 09:09:31 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/how-to-force-pandas-on-spark-plots-to-use-all-dataframe-data/m-p/28697#M20474</guid>
      <dc:creator>DavideCagnoni</dc:creator>
      <dc:date>2022-02-11T09:09:31Z</dc:date>
    </item>
    <item>
      <title>Re: How to force pandas_on_spark plots to use all dataframe data?</title>
      <link>https://community.databricks.com/t5/data-engineering/how-to-force-pandas-on-spark-plots-to-use-all-dataframe-data/m-p/28698#M20475</link>
      <description>&lt;P&gt;Hello, @Davide Cagnoni​&amp;nbsp;- It's nice to meet you! My name is Piper, and I'm a moderator for the community. Thank you for bringing this question to us. Let's give your peers a chance to respond and we'll come back if we need to.&lt;/P&gt;</description>
      <pubDate>Fri, 11 Feb 2022 16:22:34 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/how-to-force-pandas-on-spark-plots-to-use-all-dataframe-data/m-p/28698#M20475</guid>
      <dc:creator>Anonymous</dc:creator>
      <dc:date>2022-02-11T16:22:34Z</dc:date>
    </item>
    <item>
      <title>Re: How to force pandas_on_spark plots to use all dataframe data?</title>
      <link>https://community.databricks.com/t5/data-engineering/how-to-force-pandas-on-spark-plots-to-use-all-dataframe-data/m-p/28700#M20477</link>
      <description>&lt;P&gt;@Kaniz Fatma​&amp;nbsp;I need to use plotly in order to be able to interact with the graph (zoom in etc.) so this doesn't solve my problem...&lt;/P&gt;</description>
      <pubDate>Mon, 21 Feb 2022 15:57:24 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/how-to-force-pandas-on-spark-plots-to-use-all-dataframe-data/m-p/28700#M20477</guid>
      <dc:creator>DavideCagnoni</dc:creator>
      <dc:date>2022-02-21T15:57:24Z</dc:date>
    </item>
    <item>
      <title>Re: How to force pandas_on_spark plots to use all dataframe data?</title>
      <link>https://community.databricks.com/t5/data-engineering/how-to-force-pandas-on-spark-plots-to-use-all-dataframe-data/m-p/28701#M20478</link>
      <description>&lt;P&gt;@Davide Cagnoni​&amp;nbsp;&lt;/P&gt;&lt;P&gt;It's a limitation in data bricks notebooks it can't talk interactively with graphs.  &lt;/P&gt;</description>
      <pubDate>Wed, 02 Mar 2022 15:24:49 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/how-to-force-pandas-on-spark-plots-to-use-all-dataframe-data/m-p/28701#M20478</guid>
      <dc:creator>User16255483290</dc:creator>
      <dc:date>2022-03-02T15:24:49Z</dc:date>
    </item>
    <item>
      <title>Re: How to force pandas_on_spark plots to use all dataframe data?</title>
      <link>https://community.databricks.com/t5/data-engineering/how-to-force-pandas-on-spark-plots-to-use-all-dataframe-data/m-p/28703#M20480</link>
      <description>&lt;P&gt;@Kaniz Fatma​&amp;nbsp; The problem is not about performance or plotly. It is about the pandas_on_spark dataframe arbitrarily &lt;U&gt;subsampling the input data&lt;/U&gt; when plotting,&lt;U&gt; &lt;/U&gt;&lt;B&gt;&lt;U&gt;without notifying the user about it.&lt;/U&gt;&lt;/B&gt;&lt;/P&gt;&lt;P&gt;While subsampling is comprehensible and maybe even necessary sometimes, at least a notification like the one present when you `display(table)` could be useful. &lt;/P&gt;</description>
      <pubDate>Thu, 03 Mar 2022 08:07:53 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/how-to-force-pandas-on-spark-plots-to-use-all-dataframe-data/m-p/28703#M20480</guid>
      <dc:creator>DavideCagnoni</dc:creator>
      <dc:date>2022-03-03T08:07:53Z</dc:date>
    </item>
  </channel>
</rss>

