cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Notebook Visualisations suddenly not working

117074
New Contributor III

Hi all,

I have a python script which runs SQL code against our Delta Live Tables and returns a pandas dataframe. I do this multiple times and then use 'display(pandas_dataframe)'. Once this displays I then create a visualization from the UI which is then added to a dashboard.

I've run this script many times with no issues, however in the past 2 hours the behaviour of the notebook seems to have changed. When I run 'display(pandas_dataframe)' it no longer returns in a state that visualisations can be run from it. Instead it returns as if I'm using the print(pandas_dataframe) instead.

Any advice would be great, thank you. 

 

More info'

Using databricks at works through my company cloud set-up. AWS cloud.

3 REPLIES 3

117074
New Contributor III

Update: Upon restarting the cluster the problem seems to have resolved. However, I still need to find out an explanation for why this happened as it has caused additional work to now be done to restore the graphs.

Kaniz_Fatma
Community Manager
Community Manager

Hi @117074It sounds like you’re encountering an unexpected behaviour in your Databricks Notebook when using display(pandas_dataframe) it to visualize your data.

Let’s explore some potential solutions:

  1. Check Your Imports:

    • Ensure that you have the necessary imports at the beginning of your notebook. Specifically, make sure you’ve imported pandas and any other relevant libraries for visualization (e.g., matplotlib, seaborn, etc.).
  2. Matplotlib Show():

    • When using display(pandas_dataframe), it relies on the underlying visualization libraries (such as Matplotlib) to render the plots. Sometimes, the plots may not display immediately.
    • After calling display(pandas_dataframe), try adding plt.show() (assuming you’ve imported matplotlib.pyplot as plt) to explicitly show the plot. This can help ensure that the visualization is rendered properly.
  3. Check for Errors or Warnings:

    • Look for any error messages or warnings in your notebook. These can provide clues about what might be going wrong.
    • Check the Databricks logs or console for any relevant information.
  4. Kernel Restart:

    • Sometimes, notebook kernels can get into a state where they don’t display plots correctly. Try restarting the kernel and running your code again.
    • You can restart the kernel by clicking on the “Restart” button in the Databricks notebook toolbar.
  5. Inspect the Dataframe:

    • Before calling display(pandas_dataframe), verify that your Pandas dataframe (pandas_dataframe) contains the expected data.
    • Print a few rows of the dataframe using print(pandas_dataframe.head()) to ensure that the data is loaded correctly.
  6. Check Dashboard Configuration:

    • If you’re adding the visualization to a dashboard, ensure that the configuration settings (such as chart type, axes, etc.) are correctly set up.
    • Double-check any filters or parameters that might affect the visualization.
  7. Cache and Materialization:

    • Databricks caches dataframes by default. If you’re re-running the same script multiple times, it’s possible that cached data is affecting the results.
    • Consider using .cache() or .materialize() on your dataframe to control caching behavior.
  8. Update Databricks Runtime:

    • Sometimes issues can arise due to the Databricks runtime version. Check if there have been any recent updates or changes to the runtime.
    • Consider updating to the latest stable version if you’re not already using it.

If none of these solutions works, consider reaching out for further assistance. Good luck! 🚀

 

117074
New Contributor III

Thank you for the detailed response Kaniz, I appreciate it! I do think it may have been cache issues due to there being no spark computation when running them when the error occured.

It did lead me down a train of thought.. is it possible to extract the code used to generate the graphs from the visualisation UI? That way instead of doing display(pandas_df) and manually making the visualisations myself - I can just take this code. Alternatively, I may look into using some plotly graphs! 

Thank you

Join 100K+ Data Experts: Register Now & Grow with Us!

Excited to expand your horizons with us? Click here to Register and begin your journey to success!

Already a member? Login and join your local regional user group! If there isn’t one near you, fill out this form and we’ll create one for you to join!