cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Using built-in display method modules

holunder42
New Contributor II

The builtin `display` function is very helpful. but we're moving code from notebooks into python modules.

Here, it seems that `display` is defined differently which results in poor visualization.

Example:

```

df = spark.createDataFrame([{'x': 1}])
display(df)
```
--> nice visual
 
But when I move the display-code to a module:
def df_show(df😞
    display(df)
And using it by
from foo import df_show
df_show(df)
 
it only shows the __repr__ ouput.
 
My target picture:
- have a wrapper version for all calls of `display` which:
  * refers to builtin-display when running in databricks
  * refers to a custom print when running in python environment outside of databricks
 
Thanks for your support
 
1 ACCEPTED SOLUTION

Accepted Solutions

Louis_Frolio
Databricks Employee
Databricks Employee

@holunder42 , I did some digging and here is what I found. Hopefully, it will help you further troubleshoot the issue.

Here’s what’s going on and how to solve it.

Why display behaves differently in a module

On Databricks, display is a notebook-scoped helper injected by the runtime. It lives in the notebook’s IPython namespace, not in your module’s global namespace.

When you write this inside a module:

# foo.py
def df_show(df):
    display(df)

and then do:

from foo import df_show
df_show(df)

Python tries to resolve display in this order:

  1. Local scope inside df_show

  2. The module’s global scope (foo’s globals)

  3. Builtins

It does not look in the notebook’s globals, where Databricks has put display. So from the module’s point of view, display just doesn’t exist. The “plain repr output” you’re seeing is effectively the fallback behavior (for example print or df.show() in your wrapper), not the Databricks visual display.

To get notebook-style visuals from code in a module, you need to inject the display function into the module code, or otherwise abstract it.

Recommended pattern: pass display as a dependency

Treat display as a UI dependency that you pass in from the notebook when you have it, and fall back to a simple print/show when you don’t.

foo.py (module):

# foo.py

def df_show(df, visualizer=None):
    """
    visualizer: a callable like Databricks `display` or IPython.display.display.
    If None, fall back to a simple text representation.
    """
    if visualizer is not None:
        visualizer(df)
    else:
        # Fallback for non-Databricks environments
        try:
            # Nice tabular output in many console contexts
            df.show()
        except AttributeError:
            # Generic fallback
            print(df)

On Databricks notebook:

from foo import df_show

df = spark.createDataFrame([{"x": 1}])

# Pass the notebook's `display` into your module
df_show(df, display)

Outside Databricks (plain Python / script):

from foo import df_show
from pyspark.sql import SparkSession

spark = SparkSession.builder.getOrCreate()
df = spark.createDataFrame([{"x": 1}])

# No Databricks display available → falls back to df.show()
df_show(df)

This gives you:

  • Databricks: full rich notebook visualization

  • Non-Databricks: readable textual output via df.show() or print

 

 

Optional: central “smart display” wrapper

 

If you want to centralize this logic because you call it many times, define a “smart display” once and pass it around:

# foo.py

def make_smart_display(visualizer=None):
    def _smart(obj):
        if visualizer is not None:
            visualizer(obj)
        else:
            try:
                obj.show()
            except AttributeError:
                print(obj)
    return _smart

def df_show(df, smart_display):
    smart_display(df)

Usage on Databricks:

from foo import make_smart_display, df_show

smart_display = make_smart_display(display)
df_show(df, smart_display)

Usage off Databricks:

from foo import make_smart_display, df_show

smart_display = make_smart_display()  # no visualizer → fallback
df_show(df, smart_display)

Design takeaway

The core idea is: keep UI concerns (like Databricks display) at the notebook boundary and inject them into reusable modules, rather than hard-coding display inside your modules. That way your code works cleanly both on Databricks and in any plain Python environment.

Cheers, Lou

View solution in original post

4 REPLIES 4

balajij8
Contributor

Display is for interactive exploration only and not intended for use in modular python code. You can use AI BI dashboards for visualizations.

Louis_Frolio
Databricks Employee
Databricks Employee

@holunder42 , I did some digging and here is what I found. Hopefully, it will help you further troubleshoot the issue.

Here’s what’s going on and how to solve it.

Why display behaves differently in a module

On Databricks, display is a notebook-scoped helper injected by the runtime. It lives in the notebook’s IPython namespace, not in your module’s global namespace.

When you write this inside a module:

# foo.py
def df_show(df):
    display(df)

and then do:

from foo import df_show
df_show(df)

Python tries to resolve display in this order:

  1. Local scope inside df_show

  2. The module’s global scope (foo’s globals)

  3. Builtins

It does not look in the notebook’s globals, where Databricks has put display. So from the module’s point of view, display just doesn’t exist. The “plain repr output” you’re seeing is effectively the fallback behavior (for example print or df.show() in your wrapper), not the Databricks visual display.

To get notebook-style visuals from code in a module, you need to inject the display function into the module code, or otherwise abstract it.

Recommended pattern: pass display as a dependency

Treat display as a UI dependency that you pass in from the notebook when you have it, and fall back to a simple print/show when you don’t.

foo.py (module):

# foo.py

def df_show(df, visualizer=None):
    """
    visualizer: a callable like Databricks `display` or IPython.display.display.
    If None, fall back to a simple text representation.
    """
    if visualizer is not None:
        visualizer(df)
    else:
        # Fallback for non-Databricks environments
        try:
            # Nice tabular output in many console contexts
            df.show()
        except AttributeError:
            # Generic fallback
            print(df)

On Databricks notebook:

from foo import df_show

df = spark.createDataFrame([{"x": 1}])

# Pass the notebook's `display` into your module
df_show(df, display)

Outside Databricks (plain Python / script):

from foo import df_show
from pyspark.sql import SparkSession

spark = SparkSession.builder.getOrCreate()
df = spark.createDataFrame([{"x": 1}])

# No Databricks display available → falls back to df.show()
df_show(df)

This gives you:

  • Databricks: full rich notebook visualization

  • Non-Databricks: readable textual output via df.show() or print

 

 

Optional: central “smart display” wrapper

 

If you want to centralize this logic because you call it many times, define a “smart display” once and pass it around:

# foo.py

def make_smart_display(visualizer=None):
    def _smart(obj):
        if visualizer is not None:
            visualizer(obj)
        else:
            try:
                obj.show()
            except AttributeError:
                print(obj)
    return _smart

def df_show(df, smart_display):
    smart_display(df)

Usage on Databricks:

from foo import make_smart_display, df_show

smart_display = make_smart_display(display)
df_show(df, smart_display)

Usage off Databricks:

from foo import make_smart_display, df_show

smart_display = make_smart_display()  # no visualizer → fallback
df_show(df, smart_display)

Design takeaway

The core idea is: keep UI concerns (like Databricks display) at the notebook boundary and inject them into reusable modules, rather than hard-coding display inside your modules. That way your code works cleanly both on Databricks and in any plain Python environment.

Cheers, Lou

holunder42
New Contributor II

Thanks Lou, based on your feedback I found another way.

When importing 

databricks.sdk.runtime import display
i can force python to use databricks implementation even in modules.
 
The full code results in 
import os

if "DATABRICKS_RUNTIME_VERSION" in os.environ:
    from databricks.sdk.runtime import display as db_display
else:
    db_display = None  # pylint: disable=invalid-name

def df_show(data) -> None:
    """ Wrapper to display dataframes in both, databricks and local environment
    """
    if db_display is not None:
        db_display(data)
    else:
        print(data.show(10))

Which solves the initial question. Now it's to be decided if we should keep logic and visualization separate.

 

SteveOstrowski
Databricks Employee
Databricks Employee

Hi @holunder42,

The behavior you are seeing is expected. The display() function is not a standard Python built-in. It is injected into the notebook's global namespace by the Databricks runtime when a notebook cell executes. When you move code into an imported Python module (.py file), that module has its own namespace and does not automatically inherit display(), spark, dbutils, or other notebook-scoped objects. That is why your module falls back to the standard __repr__ output.

RECOMMENDED APPROACH: PASS DISPLAY AS A PARAMETER

The cleanest pattern is to pass the display function (or any notebook-scoped object) into your module as an argument:

In your module file (e.g., my_utils.py):

def show_data(df, display_fn=None):
  if display_fn is not None:
      display_fn(df)
  else:
      # Fallback for standard Python environments
      print(df.toPandas().to_string())

In your notebook:

from my_utils import show_data
df = spark.table("my_catalog.my_schema.my_table")
show_data(df, display_fn=display)

This keeps your module portable. When running inside a Databricks notebook, you pass the built-in display. When running outside (unit tests, local development), the fallback kicks in.

ALTERNATIVE: LOOK UP DISPLAY AT RUNTIME

You can also detect whether you are running inside a Databricks notebook and grab display from the IPython environment:

def get_display():
  try:
      from IPython.display import display as ipython_display
      # Check if the Databricks-enhanced display is available
      shell = get_ipython()
      if hasattr(shell, 'user_ns') and 'display' in shell.user_ns:
          return shell.user_ns['display']
      return ipython_display
  except Exception:
      return print

def show_data(df):
  display_fn = get_display()
  display_fn(df)

In this approach, get_ipython().user_ns gives you access to the notebook's namespace, which includes the Databricks-enhanced display function that renders rich table output and charts.

ALTERNATIVE: USE IPYTHON.DISPLAY DIRECTLY

If you only need basic rendering (not the full Databricks rich table with chart options), the IPython.display module works from imported modules:

from IPython.display import display, HTML

def show_html(html_string):
  display(HTML(html_string))

This renders HTML output in the notebook cell. However, it does not give you the Databricks-specific table visualization with sorting, filtering, and chart creation. For that, you need the notebook-scoped display function.

SUMMARY

- display() is notebook-scoped, not available by default in imported modules
- Best practice: pass display as an argument to your module functions
- Runtime lookup via get_ipython().user_ns['display'] also works
- IPython.display provides basic rendering but not the full Databricks visualization

Docs reference for working with Python modules in notebooks:
https://docs.databricks.com/en/files/workspace-modules.html

Docs reference for IPython kernel support in Databricks:
https://docs.databricks.com/en/notebooks/ipython-kernel.html

* This reply used an agent system I built to research and draft this response based on the wide set of documentation I have available and previous memory. I personally review the draft for any obvious issues and for monitoring system reliability and update it when I detect any drift, but there is still a small chance that something is inaccurate, especially if you are experimenting with brand new features.

If this answer resolves your question, could you mark it as "Accept as Solution"? That helps other users quickly find the correct fix.