cancel
Showing results for 
Search instead for 
Did you mean: 
Community Articles
Dive into a collaborative space where members like YOU can exchange knowledge, tips, and best practices. Join the conversation today and unlock a wealth of collective wisdom to enhance your experience and drive success.
cancel
Showing results for 
Search instead for 
Did you mean: 

From Tableau to Databricks: Migrating KPI Dashboards with Metric Views

Rojaboina
New Contributor II

How to extract Tableau calculated fields, dimensions, and measures from a .twbx workbook and re-express them as a production-grade Databricks Metric View YAML — with the Sample Superstore dataset as a complete worked example, accelerated by AI coding assistants.

 

Dataset

Sample Superstore

Source

KPI_Designs.twbx

Platform

Databricks Unity Catalog

 

1. Introduction

Tableau remains one of the most widely deployed BI tools in the enterprise, and its .twbx workbook format is a rich artefact: it bundles not just visualisations but also a full semantic layer — calculated fields, period-over-period logic, dimension hierarchies, and target thresholds — all expressed in Tableau's formula language.

Databricks Metric Views, available in Unity Catalog as part of the AI/BI surface, offer a YAML-native way to declare that same semantic layer inside your lakehouse. Once registered, a Metric View becomes a reusable, queryable object that powers Genie conversations, AI/BI Dashboards, and any SQL client connected to your warehouse — without rebuilding logic in every downstream tool.

This article walks through a complete migration using KPI_Designs.twbx — a publicly available Tableau workbook built on the Sample Superstore dataset — as the source, and produces a working version: 0.1 Metric View YAML that was validated live in a Databricks workspace. The translation itself was accelerated using an AI coding assistant; the techniques generalise to Claude, Cursor, GitHub Copilot, or Databricks Assistant / Genie Code.

 

2. Significance of Metric Views in Databricks

2.1  The Semantic Layer Problem

Every analytics organisation eventually runs into the same issue: the same metric is calculated differently in different tools. The marketing team's Tableau workbook defines "Profit Ratio" as SUM(Profit)/SUM(Sales) for the current month. The finance team's Looker explore defines it as AVG(profit_margin) across all rows. Both are called "Profit Ratio" on executive dashboards. Neither team knows the other exists.

This is the semantic layer problem — the gap between raw data in a warehouse and the agreed-upon business definitions that should govern every report. Metric Views solve it by making those definitions a first-class citizen of the lakehouse, stored in Unity Catalog alongside the tables they describe.

 

2.2  What a Metric View Actually Is

A Metric View is a Unity Catalog object defined in YAML. It references a source table, declares typed dimensions (the axes by which you slice data), and declares measures (the aggregations you care about). Measures can reference other measures using the MEASURE() function, enabling compound calculations without repeating SQL.

# Minimal metric view structure

version: 0.1

 

source: catalog.schema.table

 

dimensions:

  - name: Order Date

    expr: order_date

 

measures:

  - name: Sales

    expr: SUM(sales)

 

  - name: Profit Ratio

    expr: SUM(profit) / NULLIF(SUM(sales), 0)

 

2.3  Why They Matter

Metric Views deliver four concrete benefits for data engineering teams:

  • Single source of truth. Business logic lives in one YAML file under version control, not scattered across Tableau calculated fields, dbt metrics, and ad-hoc SQL notebooks.
  • Natural language querying. Databricks Genie can reason over registered Metric Views, letting business users ask questions in plain English and receive answers backed by your governed definitions.
  • AI/BI Dashboard compatibility. Metric Views are the native data source for Databricks AI/BI Dashboards, enabling auto-generated visualisations with correct aggregation semantics.
  • Governance and lineage. Unity Catalog tracks Metric View dependencies, so column-level lineage runs from raw table through dimension/measure to dashboard cell.

 

Key insight

Migrating from Tableau is not just about moving charts — it is about promoting your semantic layer (the business logic inside calculated fields) from a proprietary workbook format into an open, versionable, lakehouse-native standard.

 

3. Anatomy of a .twbx File

A .twbx file is a ZIP archive. Unzipping it reveals a .twb XML document alongside any packaged data extracts (.hyper files) and images. The .twb is where all the metric logic lives.

# Inspect the archive contents

unzip -l KPI_Designs.twbx

 

# Output:

#   KPI Designs.twb          ← all metric logic in XML

#   Data/TEMP_*.hyper        ← packaged data extract

#   Image/Mini.png           ← workbook thumbnail

The .twb XML has a predictable structure. The elements relevant to metric migration are:

  • <datasource>  — defines the source connection and all calculated fields
  • <column>  — each calculated field with its name, datatype, role (dimension/measure), and formula
  • <worksheet>  — individual sheet definitions that reference columns
  • <dashboard>  — layout containers that assemble worksheets

For KPI_Designs.twbx specifically, the workbook contains 50+ calculated fields covering Sales, Profit, Profit Ratio, and Customers — each with last-month, last-year, YTD, QTD, and running-total variants. These calculated fields are the raw material that needs to be translated into Metric View measures.

 

4. Mapping Tableau Concepts to Metric View YAML

4.1  Concept Mapping Table

Most Tableau concepts have a clean Metric View equivalent, but a few — notably LOD expressions, table calculations, and parameters — require deliberate adaptation rather than direct translation.

Tableau Concept

Databricks Metric View Equivalent

Dimension field

dimensions[] entry — name plus expr referencing a column or SQL expression used to slice measures.

Measure field

measures[] entry — name plus expr containing an aggregation such as SUM, COUNT DISTINCT, or AVG.

Calculated dimension

dimensions[].expr with an arbitrary SQL expression, e.g. DATE_TRUNC('MONTH', order_date).

Calculated measure

measures[].expr containing the aggregation, optionally referencing other measures via MEASURE(`name`).

FIXED LOD expression

Scalar subquery works for global aggregates like {MAX([Order Date])}. Row-level FIXED LODs like {FIXED [Sub-Category] : SUM([Sales])} need window functions in a derived view or a separate aggregated table — they do not have a direct one-line equivalent.

Table calculation

Window function inside the measure expr (e.g. SUM(...) OVER (ORDER BY ...)). Behaviour differs from Tableau, which computes after the visualisation partition is formed — validate results carefully.

Parameter (user input)

No native equivalent. Either hardcode a sensible default in the measure (e.g. * 1.10 growth assumption) or expose the value as a filter applied at query/dashboard time in AI/BI.

Date filter / TF field

CASE WHEN inside the measure expr with bounds computed via ADD_MONTHS and DATE_TRUNC against the latest data date.

Set / group

Either a derived dimension with a CASE WHEN, or a join to a mapping table declared as a separate source.

 

4.2  Translating Period Logic

The most intricate part of the Superstore workbook is the period-over-period logic. Tableau achieves this with Level of Detail (LOD) expressions and boolean "time filter" calculated fields. In a Metric View, you replicate the global-anchor variant with CASE WHEN inside an aggregation, where the anchor date is computed via a scalar subquery against the source table.

# Tableau calculated field: Sales | LM

# IF [Date TF] THEN [Sales] END

# where [Date TF] checks if order_date falls in the most recent complete month

 

# Databricks Metric View equivalent:

- name: Sales LM

  expr: |

    SUM(CASE

      WHEN order_date >= ADD_MONTHS(

             DATE_TRUNC('MONTH',

               (SELECT MAX(order_date) FROM default.sample_superstore)), -1)

       AND order_date <  DATE_TRUNC('MONTH',

               (SELECT MAX(order_date) FROM default.sample_superstore))

      THEN sales

    END)

The scalar subquery resolves the anchor date dynamically at query time, which mirrors how Tableau's {MAX([Order Date])} LOD expression behaves at the data source level. The same pattern with ADD_MONTHS(..., -13) and ADD_MONTHS(..., -12) produces the same-month-last-year window.

 

4.3  Using MEASURE() for Derived Metrics

Compound metrics like Profit Ratio or percentage-change deltas should reference already-defined measures rather than repeating aggregation SQL. Databricks provides the MEASURE() function for this purpose, both inside the YAML and when querying the view downstream:

# Profit Ratio — derived from two other measures

- name: Profit Ratio LM

  expr: MEASURE(`Profit LM`) / NULLIF(MEASURE(`Sales LM`), 0)

 

# Period-over-period delta

- name: Sales % Change LMLY

  expr: >

    (MEASURE(`Sales LM`) - MEASURE(`Sales LY`))

    / NULLIF(MEASURE(`Sales LY`), 0)

 

# Boolean status flag

- name: Sales Hit Target

  expr: >

    CASE WHEN MEASURE(`Sales LM`) > MEASURE(`Sales Target`)

    THEN 1 ELSE 0 END

Backtick rule

Measure names containing spaces must be wrapped in backticks inside MEASURE() references: MEASURE(`Sales LM`). Single-word names work either way, but use backticks consistently to avoid ambiguity.

 

4.4  Accelerating the Translation with AI Coding Assistants

Translating dozens of calculated fields by hand is tedious and error-prone. A modern shortcut is to delegate the translation to an AI coding assistant. The tool reads the .twb XML (or a textual export of the calculated fields), interprets the Tableau formula language, and emits the equivalent Databricks Metric View YAML — including the period-over-period CASE WHEN patterns shown above.

Several assistants work well for this task:

  • Claude (Sonnet or Opus). Strong at structured-format translation; paste the .twb XML and the target source table schema, and ask for the YAML directly. Handles nuanced cases like LOD expressions and target logic with a clear preamble.
  • Cursor or GitHub Copilot. IDE-integrated; useful when iterating on the YAML in your repo and refining individual measures with inline suggestions.
  • Databricks Assistant / Genie Code. Runs inside the Databricks notebook environment with awareness of your Unity Catalog schemas, making it the natural choice when the source table already lives in Databricks.

A high-leverage prompt template for this work:

I have a Tableau .twbx workbook (calculated field XML pasted below) built on a

table at <catalog>.<schema>.<table> with columns: <list>.

 

Generate a Databricks Metric View YAML (version: 0.1) that:

  1. 1. Declares each calculated dimension and measure
  2. 2. Translates period-over-period logic using ADD_MONTHS + scalar subquery

     anchored to MAX(order_date) in the source table

  1. 3. Uses MEASURE(`name`) syntax for derived measures
  2. 4. Handles NULL division with NULLIF()

 

Tableau calculated fields:

<paste the <column> elements from the .twb XML here>

Caveat

AI-generated YAML still needs review. LLMs occasionally hallucinate column names, miss edge cases in date arithmetic, or invent dimension types that do not exist in the source. Always run DESCRIBE on the source table before applying the YAML, and validate the syntax in the Metric View editor — which highlights errors in real time.

 

5. The Complete Metric View YAML

Below is the full working YAML for the Superstore KPI workbook, validated in a live Databricks workspace (workspace.default schema, Serverless Starter Warehouse). It captures the four headline KPIs — Sales, Profit, Profit Ratio, and Customers — along with last-month / last-year variants, period-over-period deltas, year-to-date / quarter-to-date roll-ups, and target tracking.

version: 0.1

 

source: default.sample_superstore

 

# ── Dimensions ───────────────────────────────────────────────

dimensions:

  - name: Order Date

    expr: order_date

  - name: Order Month

    expr: DATE_TRUNC('MONTH', order_date)

  - name: Order Quarter

    expr: DATE_TRUNC('QUARTER', order_date)

  - name: Order Year

    expr: YEAR(order_date)

  - name: Customer ID

    expr: customer_id

  - name: Customer Name

    expr: customer_name

  - name: Segment

    expr: segment

  - name: Region

    expr: region

  - name: State/Province

    expr: state_province

  - name: Category

    expr: category

  - name: Sub-Category

    expr: sub_category

  - name: Product Name

    expr: product_name

  - name: Ship Mode

    expr: ship_mode

 

# ── Measures ─────────────────────────────────────────────────

measures:

  # Core

  - name: Sales

    expr: SUM(sales)

  - name: Profit

    expr: SUM(profit)

  - name: Profit Ratio

    expr: SUM(profit) / NULLIF(SUM(sales), 0)

  - name: Customers

    expr: COUNT(DISTINCT customer_id)

  - name: Orders

    expr: COUNT(DISTINCT order_id)

 

  # Last Month

  - name: Sales LM

    expr: |

      SUM(CASE WHEN order_date >= ADD_MONTHS(DATE_TRUNC('MONTH',

        (SELECT MAX(order_date) FROM default.sample_superstore)), -1)

       AND order_date < DATE_TRUNC('MONTH',

        (SELECT MAX(order_date) FROM default.sample_superstore))

      THEN sales END)

  # ... (Profit LM, Customers LM follow same pattern)

 

  # Period-over-period

  - name: Sales % Change LMLY

    expr: >

      (MEASURE(`Sales LM`) - MEASURE(`Sales LY`))

      / NULLIF(MEASURE(`Sales LY`), 0)

 

  # Targets

  - name: Sales Target

    expr: MEASURE(`Sales LY`) * 1.10

  - name: Sales Hit Target

    expr: CASE WHEN MEASURE(`Sales LM`) > MEASURE(`Sales Target`) THEN 1 ELSE 0 END

  - name: Profit Ratio Target

    expr: 0.15

 

6. Registering the Metric View in Databricks

With the YAML written, the next step is to register it as a first-class object in Unity Catalog. Databricks offers two paths: an interactive UI for one-off creation, and a programmatic path for CI/CD or repository-driven workflows.

6.1  Creating via the Catalog Explorer UI

In the Databricks workspace, navigate to Catalog → select your target catalog and schema (here, workspace.default) → Create → Metric View. You will be prompted for a name and location:

Rojaboina_0-1780782791546.jpeg

 

Figure 1 — Create metric view dialog: name set to sample_superstore_metric_view, location workspace.default.

After clicking Create, Databricks opens a side-by-side editor with the YAML on the right and a live UI preview / catalog tree on the left. Paste the YAML, choose a warehouse (Serverless Starter Warehouse works for development), and click Save. The editor validates syntax in real time and surfaces parse errors inline.

Rojaboina_1-1780782791571.jpeg

 

Figure 2 — YAML editor with the Superstore Metric View loaded. The Catalog Explorer panel on the left confirms the view registered in workspace.default; the YAML on the right shows dimensions and measures resolved against default.sample_superstore.

 

6.2  Creating Programmatically

For teams managing metric definitions in a Git repository, Metric Views can be created and updated via the Databricks CLI or REST API. Both accept the same YAML payload, making CI/CD straightforward:

# Databricks CLI

databricks metric-views create \

  --catalog workspace \

  --schema default \

  --name sample_superstore_metric_view \

  --definition-file superstore_metrics_view.yaml

 

# Or use SQL DDL in a notebook / SQL editor:

CREATE OR REPLACE VIEW workspace.default.sample_superstore_metric_view

WITH METRICS

LANGUAGE YAML

AS $$

<paste YAML body here>

$$;

 

7. Using the Metric View

7.1  Querying with SQL

A registered Metric View is queryable from any SQL surface in Databricks. The important difference from a regular view is that measures must be invoked via the MEASURE() function — they are not plain columns. Dimensions are referenced by name (with backticks if they contain spaces).

-- Sales and Profit Ratio by Region, current vs last month

SELECT

    `Region`,

    MEASURE(`Sales`)                 AS sales,

    MEASURE(`Sales LM`)              AS sales_last_month,

    MEASURE(`Sales % Change LMLY`)   AS sales_pct_change,

    MEASURE(`Profit Ratio`)          AS profit_ratio,

    MEASURE(`Profit Ratio Below Target`) AS below_target_flag

FROM workspace.default.sample_superstore_metric_view

GROUP BY `Region`

ORDER BY MEASURE(`Sales`) DESC;

Note that ORDER BY against a measure must also use the MEASURE() wrapper. Aliases declared in the SELECT cannot be referenced in ORDER BY with the measure name alone — Databricks resolves measure references through the function call.

 

7.2  Natural Language with Genie

Register the Metric View as a data source in a Genie space, and business users can ask questions in plain English:

  • "Which region had the highest profit ratio last month?"
  • "Show me sales hit-target performance by category."
  • "What is the year-over-year sales change by sub-category?"

Genie uses the declared measure definitions to construct correct SQL — including the MEASURE() wrappers — eliminating the risk of ad-hoc aggregation errors. Because measure semantics are encoded in the YAML, the answers Genie produces are consistent across users and questions.

 

7.3  AI/BI Dashboards

Connect an AI/BI Dashboard directly to the Metric View. Databricks auto-suggests visualisation types based on measure characteristics and generates chart configurations from dimension and measure names. The KPI cards, sparklines, and period-comparison deltas from the original Tableau workbook can be recreated in minutes — and any change to the underlying YAML propagates to every dashboard built on it.

 

8. Best Practices and Gotchas

8.1  Normalise Column Names Before Building the View

The Sample Superstore Excel ships with headers containing spaces, slashes, and dashes (Order Date, Sub-Category, State/Province). Convert all column names to lowercase snake_case during ingestion so every expr field can reference them without backtick escaping. A one-time rename (order_date, sub_category, state_province) permanently eliminates that friction.

8.2  Scalar Subqueries Beat Window Functions for Date Anchors

The pattern (SELECT MAX(order_date) FROM source_table) inside a CASE WHEN gives you a dynamic "latest date" anchor without needing a calendar table or a parameter. This is the Databricks equivalent of Tableau's {MAX([Order Date])} LOD expression — and unlike MAX(...) OVER (), it nests cleanly inside aggregations.

8.3  Use ADD_MONTHS, Not DATE_ADD, for Month Arithmetic

For shifting dates by N months, always use ADD_MONTHS(date, n) — not DATE_ADD(date, n * 30). The latter breaks on months with different day counts and produces wrong results at month boundaries. ADD_MONTHS handles leap years and 28/29/30/31-day months correctly.

8.4  Wrap Measure References in Backticks

Inside both YAML expressions and SQL queries, measure names containing spaces require backticks: MEASURE(`Sales LM`), MEASURE(`Profit Ratio`). Missing backticks produce a parse error that is easy to miss in a long YAML. Use them consistently even for single-word names to avoid the inconsistency.

8.5  Tableau Table Calculations Do Not Translate Directly

Tableau's table calculations (RUNNING_SUM, LOOKUP, RANK) are computed after aggregation, row by row within a visualisation partition. They have no direct Metric View equivalent. Running totals can be approximated with window functions (SUM(...) OVER (ORDER BY ...)), but these behave differently when filters or groupings change. Flag these fields for manual review during migration — they are the most likely source of result discrepancies between the Tableau original and the Databricks port.

8.6  Validate Sample Aggregates Against the Source Workbook

Before declaring the migration done, pick three or four measures (Sales YTD, Profit Ratio, % Change LMLY) and compare numbers between the Tableau workbook and the Databricks Metric View for an identical filter. A 0.1% discrepancy usually points to a date-boundary edge case; a >5% discrepancy usually points to a translation error in an LOD or table calculation.

 

9. Conclusion

Tableau workbooks are a rich source of institutional knowledge about how a business wants its metrics defined. The calculated fields, date logic, and target thresholds inside a .twbx file represent years of alignment between data and business teams.

Databricks Metric Views offer a path to preserve that knowledge in an open, lakehouse-native format — one that is queryable by SQL, understandable by Genie, and reusable across every downstream tool connected to Unity Catalog. The migration process follows a clear pattern: extract calculated fields from the .twb XML, translate Tableau LOD expressions into scalar subqueries, map dimension and measure roles to YAML structure, and normalise source column names. AI coding assistants compress what used to be a multi-week project into a single afternoon of focused review.

The result is a governed semantic layer that any tool or AI agent in the Databricks ecosystem can consume consistently — and that survives the inevitable next BI tool migration without rebuilding business logic from scratch.

 

Resources

Sample Superstore data: https://public.tableau.com/app/learn/sample-data KPI Designs workbook (Stanke): https://public.tableau.com/app/profile/stanke/viz/KPIDesigns/KPIDesign Databricks AI/BI documentation: https://docs.databricks.com/en/ai-bi/index.html

 #Databricks #Metric Views #Tableau

0 REPLIES 0