cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

DavideCagnoni
by Contributor
  • 4775 Views
  • 4 replies
  • 1 kudos

How to force pandas_on_spark plots to use all dataframe data?

When I load a table as a `pandas_on_spark` dataframe, and try to e.g. scatterplot two columns, what I obtain is a subset of the desired points. For example, if I try to plot two columns from a table with 1000000 rows, I only see some of the data - i...

  • 4775 Views
  • 4 replies
  • 1 kudos
Latest Reply
DavideCagnoni
Contributor
  • 1 kudos

@Kaniz Fatma​  The problem is not about performance or plotly. It is about the pandas_on_spark dataframe arbitrarily subsampling the input data when plotting, without notifying the user about it.While subsampling is comprehensible and maybe even nece...

  • 1 kudos
3 More Replies
Labels