How to force pandas_on_spark plots to use all dataframe data?
When I load a table as a `pandas_on_spark` dataframe, and try to e.g. scatterplot two columns, what I obtain is a subset of the desired points. For example, if I try to plot two columns from a table with 1000000 rows, I only see some of the data - i...
- 4775 Views
- 4 replies
- 1 kudos
Latest Reply
@Kaniz Fatma​ The problem is not about performance or plotly. It is about the pandas_on_spark dataframe arbitrarily subsampling the input data when plotting, without notifying the user about it.While subsampling is comprehensible and maybe even nece...
- 1 kudos