cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
cancel
Showing results for 
Search instead for 
Did you mean: 

Visulization only from sample of data

Ondrej_Lostak
New Contributor

When I display dataframe and add visualization, I can see a preview from only a sample of data, and when I confirm it, it is counted from all of the data. Until now, everything is fine. However, when I change the dataframe, the visualization is inconsistent and only considere a sample of the data, so I need to create the visualization again. This makes the visualizations a little bit unfriendly for me.

Is there a way how set the visualiztion, so it is consitent with the source data all the time?

2 REPLIES 2

Anonymous
Not applicable

@Ondrej Lostak​ : Hope I understood your question correctly. Please let me know if otherwise after reading the below suggestions.

When you create a visualization for a DataFrame in Databricks, the preview is generated based on a sample of the data. However, when you confirm the visualization and it is counted from all of the data, the visualization should be consistent with the source data.

If you are experiencing inconsistencies with your visualizations after changing the DataFrame, one possible reason could be that the changes you made to the DataFrame affected the distribution or the structure of the data, and thus the visualization needs to be updated accordingly. In this case, you would need to recreate the visualization to ensure it is consistent with the updated DataFrame.

However, if you are making minor changes to the DataFrame, such as renaming columns or filtering rows, and you want to avoid having to recreate the visualization every time, you can try using the cache() method on the DataFrame before creating the visualization. This will cache the DataFrame in memory and improve performance, but it will also ensure that the visualization is consistent with the source data at all times, even after making minor changes.

Anonymous
Not applicable

Hi @Ondrej Lostak​ 

Hope everything is going great.

Just wanted to check in if you were able to resolve your issue. If yes, would you be happy to mark an answer as best so that other members can find the solution more quickly? If not, please tell us so we can help you. 

Cheers!

Welcome to Databricks Community: Lets learn, network and celebrate together

Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections. 

Click here to register and join today! 

Engage in exciting technical discussions, join a group with your peers and meet our Featured Members.