cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

Visulization only from sample of data

Ondrej_Lostak
New Contributor

When I display dataframe and add visualization, I can see a preview from only a sample of data, and when I confirm it, it is counted from all of the data. Until now, everything is fine. However, when I change the dataframe, the visualization is inconsistent and only considere a sample of the data, so I need to create the visualization again. This makes the visualizations a little bit unfriendly for me.

Is there a way how set the visualiztion, so it is consitent with the source data all the time?

2 REPLIES 2

Anonymous
Not applicable

@Ondrej Lostakโ€‹ : Hope I understood your question correctly. Please let me know if otherwise after reading the below suggestions.

When you create a visualization for a DataFrame in Databricks, the preview is generated based on a sample of the data. However, when you confirm the visualization and it is counted from all of the data, the visualization should be consistent with the source data.

If you are experiencing inconsistencies with your visualizations after changing the DataFrame, one possible reason could be that the changes you made to the DataFrame affected the distribution or the structure of the data, and thus the visualization needs to be updated accordingly. In this case, you would need to recreate the visualization to ensure it is consistent with the updated DataFrame.

However, if you are making minor changes to the DataFrame, such as renaming columns or filtering rows, and you want to avoid having to recreate the visualization every time, you can try using the cache() method on the DataFrame before creating the visualization. This will cache the DataFrame in memory and improve performance, but it will also ensure that the visualization is consistent with the source data at all times, even after making minor changes.

Anonymous
Not applicable

Hi @Ondrej Lostakโ€‹ 

Hope everything is going great.

Just wanted to check in if you were able to resolve your issue. If yes, would you be happy to mark an answer as best so that other members can find the solution more quickly? If not, please tell us so we can help you. 

Cheers!

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you wonโ€™t want to miss the chance to attend and share knowledge.

If there isnโ€™t a group near you, start one and help create a community that brings people together.

Request a New Group