Databricks Community

Artem_Y · ‎10-15-2024

In this post, we'll examine one approach to creating a sparkline in a Databricks Dashboard table.

Approach

As of writing this post, there is no built-in method of creating sparklines in a table, so we need to explore some workarounds. All workarounds involve collecting data into a list and passing it to an SQL query or function that combines the data with renderable HTML or converts it to renderable image data.

The lightweight approach is to wrap the data with HTML and use SVG to render the sparkline. However, SVG can be finicky with its definition and is not straightforward for users who don't have a strong foundation in front-end development. Therefore, we will look at using a Databricks Unity Catalog Python Function that will leverage the more commonly known matplotlib Python library. After the function's definition, all users can simply call it like any other function and get back a visualization.

Setup

The implementation will follow these steps

UC SQL Python Function Creation
(optional) Gold table creation
Application of the function
Setting visualization option

Let's skip to the application of the SQL so we can see how simple it is to apply the function we'll define later in the article.

Let's assume our data looks something like this.

And we want to make a sparkline for each resort and mtn_loc as we see in the first screenshot.

This will require you to preprocess data so that there is a single column containing all the points that the sparkline can use. This can either be handled during gold table creation or written as a SQL query such as the one below.

Let's take a look at what the final SQL will look like after the SQL function is defined

SQL query to create an ordered array column as a CTE.
The data is collected into a map and orderd by week and the value is extracted into a list.
Finally the function is applied on the correct data type.

with collections as (
  SELECT
    resort,
    mtn_loc,
    transform(
      array_sort(
        collect_list(struct(week, bluebird_p :: int as bluebird_p))
      ),
      x -> x.bluebird_p
    ) AS bluebird_p_list
  FROM
    yevtushenko_artem.shreddy.snow_powder_history
  where
    resort = 'Hakuba47'
  GROUP BY ALL
)
select
  * except(bluebird_p_list),
  yevtushenko_artem.default.make_sparkline(bluebird_p_list:: string) as sparkline
from
  collections

Next, let's take a look at how we define make_sparkline.

Creating the Function

This article assumes you have read the documentation for CREATE FUNCTION Documentation.

Execute the following SQL replacing the <catalog> and <schema> components with your own UC values.

CREATE FUNCTION <catalog>.<schema>.make_sparkline(input_data STRING)
RETURNS STRING
LANGUAGE PYTHON
AS $$
  import matplotlib
  import matplotlib.pyplot as plt
  import base64
  from io import BytesIO

  def _sparkline(input_data):
    # removes "[" and "]" characters that may be present
    data = input_data.replace("[", "").replace("]", "")
    data = [int(x) for x in data.split(",")]
    figsize = (4, 0.25)
    fig, ax = plt.subplots(1, 1, figsize=figsize)

    ax.plot(data)
    for k, v in ax.spines.items():
        v.set_visible(False)
    ax.set_xticks([])
    ax.set_yticks([])

    plt.plot(len(data) - 1, data[-1], 'r.')

    ax.fill_between(range(len(data)), data, [min(data)] * len(data), alpha=0.1)

    img = BytesIO()
    plt.savefig(img, format='png', transparent=True, bbox_inches='tight')
    img.seek(0)
    plt.close()

    return "data&colon;image/png;base64, " + base64.b64encode(img.read()).decode("UTF-8")

  return _sparkline(input_data) if input_data else None
$$;

Thanks to Ben Hayes (ben.hayes@databricks.com) for the python script

Once this function is defined the SQL in the Setup section will execute successfully.

Next, we need to make sure the output can be visualized.

Visualization Creation

This is the output of the SQL query where we apply make_sparkline.

Clearly, this isn't a sparkline.

In a dashboard or query editor visualization builder we will create a table visualization.

The important setting to change under settings for the sparkline column is display as.

Select image from the dropdown in the sparkline column display as options

The sparkline should be visualized appropriately now.

Other Considerations

The Python code is customizable, so you have the flexibility to change the visualization to your liking.
The image is encoded as a string and can be fairly large, so you may want to consider how many records to show at once. Large amounts of data can slow a dashboard down. Experiment with pagination of tables or application of default filters for best performance.
Make sure that you grant usage to the function to users.