Databricks Community

Ondrej_Lostak · ‎03-10-2023

When I display dataframe and add visualization, I can see a preview from only a sample of data, and when I confirm it, it is counted from all of the data. Until now, everything is fine. However, when I change the dataframe, the visualization is inconsistent and only considere a sample of the data, so I need to create the visualization again. This makes the visualizations a little bit unfriendly for me.

Is there a way how set the visualiztion, so it is consitent with the source data all the time?

Anonymous · ‎03-14-2023

@Ondrej Lostak : Hope I understood your question correctly. Please let me know if otherwise after reading the below suggestions.

When you create a visualization for a DataFrame in Databricks, the preview is generated based on a sample of the data. However, when you confirm the visualization and it is counted from all of the data, the visualization should be consistent with the source data.

If you are experiencing inconsistencies with your visualizations after changing the DataFrame, one possible reason could be that the changes you made to the DataFrame affected the distribution or the structure of the data, and thus the visualization needs to be updated accordingly. In this case, you would need to recreate the visualization to ensure it is consistent with the updated DataFrame.

However, if you are making minor changes to the DataFrame, such as renaming columns or filtering rows, and you want to avoid having to recreate the visualization every time, you can try using the cache() method on the DataFrame before creating the visualization. This will cache the DataFrame in memory and improve performance, but it will also ensure that the visualization is consistent with the source data at all times, even after making minor changes.

Anonymous · ‎03-31-2023

Hi @Ondrej Lostak

Hope everything is going great.

Just wanted to check in if you were able to resolve your issue. If yes, would you be happy to mark an answer as best so that other members can find the solution more quickly? If not, please tell us so we can help you.

Cheers!

Databricks Community

Visulization only from sample of data

Connect with Databricks Users in Your Area

Databricks Learning Festival (Virtual): 15 January - 31 January 2025

Milestone: DatabricksTV Reaches 100 Videos!

Announcing the new Meta Llama 3.3 model on Databricks

Databricks Community Champion - December 2024 - Sujesh Menon

Dotmatics and Databricks Partner to Advance Scientific Intelligence in Life Sciences