Saturday
How to extract Tableau calculated fields, dimensions, and measures from a .twbx workbook and re-express them as a production-grade Databricks Metric View YAML — with the Sample Superstore dataset as a complete worked example, accelerated by AI coding assistants.
Dataset Sample Superstore | Source KPI_Designs.twbx | Platform Databricks Unity Catalog |
Tableau remains one of the most widely deployed BI tools in the enterprise, and its .twbx workbook format is a rich artefact: it bundles not just visualisations but also a full semantic layer — calculated fields, period-over-period logic, dimension hierarchies, and target thresholds — all expressed in Tableau's formula language.
Databricks Metric Views, available in Unity Catalog as part of the AI/BI surface, offer a YAML-native way to declare that same semantic layer inside your lakehouse. Once registered, a Metric View becomes a reusable, queryable object that powers Genie conversations, AI/BI Dashboards, and any SQL client connected to your warehouse — without rebuilding logic in every downstream tool.
This article walks through a complete migration using KPI_Designs.twbx — a publicly available Tableau workbook built on the Sample Superstore dataset — as the source, and produces a working version: 0.1 Metric View YAML that was validated live in a Databricks workspace. The translation itself was accelerated using an AI coding assistant; the techniques generalise to Claude, Cursor, GitHub Copilot, or Databricks Assistant / Genie Code.
Every analytics organisation eventually runs into the same issue: the same metric is calculated differently in different tools. The marketing team's Tableau workbook defines "Profit Ratio" as SUM(Profit)/SUM(Sales) for the current month. The finance team's Looker explore defines it as AVG(profit_margin) across all rows. Both are called "Profit Ratio" on executive dashboards. Neither team knows the other exists.
This is the semantic layer problem — the gap between raw data in a warehouse and the agreed-upon business definitions that should govern every report. Metric Views solve it by making those definitions a first-class citizen of the lakehouse, stored in Unity Catalog alongside the tables they describe.
A Metric View is a Unity Catalog object defined in YAML. It references a source table, declares typed dimensions (the axes by which you slice data), and declares measures (the aggregations you care about). Measures can reference other measures using the MEASURE() function, enabling compound calculations without repeating SQL.
# Minimal metric view structure
version: 0.1
source: catalog.schema.table
dimensions:
- name: Order Date
expr: order_date
measures:
- name: Sales
expr: SUM(sales)
- name: Profit Ratio
expr: SUM(profit) / NULLIF(SUM(sales), 0)
Metric Views deliver four concrete benefits for data engineering teams:
Key insight Migrating from Tableau is not just about moving charts — it is about promoting your semantic layer (the business logic inside calculated fields) from a proprietary workbook format into an open, versionable, lakehouse-native standard. |
A .twbx file is a ZIP archive. Unzipping it reveals a .twb XML document alongside any packaged data extracts (.hyper files) and images. The .twb is where all the metric logic lives.
# Inspect the archive contents
unzip -l KPI_Designs.twbx
# Output:
# KPI Designs.twb ← all metric logic in XML
# Data/TEMP_*.hyper ← packaged data extract
# Image/Mini.png ← workbook thumbnail
The .twb XML has a predictable structure. The elements relevant to metric migration are:
For KPI_Designs.twbx specifically, the workbook contains 50+ calculated fields covering Sales, Profit, Profit Ratio, and Customers — each with last-month, last-year, YTD, QTD, and running-total variants. These calculated fields are the raw material that needs to be translated into Metric View measures.
Most Tableau concepts have a clean Metric View equivalent, but a few — notably LOD expressions, table calculations, and parameters — require deliberate adaptation rather than direct translation.
Tableau Concept | Databricks Metric View Equivalent |
Dimension field | dimensions[] entry — name plus expr referencing a column or SQL expression used to slice measures. |
Measure field | measures[] entry — name plus expr containing an aggregation such as SUM, COUNT DISTINCT, or AVG. |
Calculated dimension | dimensions[].expr with an arbitrary SQL expression, e.g. DATE_TRUNC('MONTH', order_date). |
Calculated measure | measures[].expr containing the aggregation, optionally referencing other measures via MEASURE(`name`). |
FIXED LOD expression | Scalar subquery works for global aggregates like {MAX([Order Date])}. Row-level FIXED LODs like {FIXED [Sub-Category] : SUM([Sales])} need window functions in a derived view or a separate aggregated table — they do not have a direct one-line equivalent. |
Table calculation | Window function inside the measure expr (e.g. SUM(...) OVER (ORDER BY ...)). Behaviour differs from Tableau, which computes after the visualisation partition is formed — validate results carefully. |
Parameter (user input) | No native equivalent. Either hardcode a sensible default in the measure (e.g. * 1.10 growth assumption) or expose the value as a filter applied at query/dashboard time in AI/BI. |
Date filter / TF field | CASE WHEN inside the measure expr with bounds computed via ADD_MONTHS and DATE_TRUNC against the latest data date. |
Set / group | Either a derived dimension with a CASE WHEN, or a join to a mapping table declared as a separate source. |
The most intricate part of the Superstore workbook is the period-over-period logic. Tableau achieves this with Level of Detail (LOD) expressions and boolean "time filter" calculated fields. In a Metric View, you replicate the global-anchor variant with CASE WHEN inside an aggregation, where the anchor date is computed via a scalar subquery against the source table.
# Tableau calculated field: Sales | LM
# IF [Date TF] THEN [Sales] END
# where [Date TF] checks if order_date falls in the most recent complete month
# Databricks Metric View equivalent:
- name: Sales LM
expr: |
SUM(CASE
WHEN order_date >= ADD_MONTHS(
DATE_TRUNC('MONTH',
(SELECT MAX(order_date) FROM default.sample_superstore)), -1)
AND order_date < DATE_TRUNC('MONTH',
(SELECT MAX(order_date) FROM default.sample_superstore))
THEN sales
END)
The scalar subquery resolves the anchor date dynamically at query time, which mirrors how Tableau's {MAX([Order Date])} LOD expression behaves at the data source level. The same pattern with ADD_MONTHS(..., -13) and ADD_MONTHS(..., -12) produces the same-month-last-year window.
Compound metrics like Profit Ratio or percentage-change deltas should reference already-defined measures rather than repeating aggregation SQL. Databricks provides the MEASURE() function for this purpose, both inside the YAML and when querying the view downstream:
# Profit Ratio — derived from two other measures
- name: Profit Ratio LM
expr: MEASURE(`Profit LM`) / NULLIF(MEASURE(`Sales LM`), 0)
# Period-over-period delta
- name: Sales % Change LMLY
expr: >
(MEASURE(`Sales LM`) - MEASURE(`Sales LY`))
/ NULLIF(MEASURE(`Sales LY`), 0)
# Boolean status flag
- name: Sales Hit Target
expr: >
CASE WHEN MEASURE(`Sales LM`) > MEASURE(`Sales Target`)
THEN 1 ELSE 0 END
Backtick rule Measure names containing spaces must be wrapped in backticks inside MEASURE() references: MEASURE(`Sales LM`). Single-word names work either way, but use backticks consistently to avoid ambiguity. |
Translating dozens of calculated fields by hand is tedious and error-prone. A modern shortcut is to delegate the translation to an AI coding assistant. The tool reads the .twb XML (or a textual export of the calculated fields), interprets the Tableau formula language, and emits the equivalent Databricks Metric View YAML — including the period-over-period CASE WHEN patterns shown above.
Several assistants work well for this task:
A high-leverage prompt template for this work:
I have a Tableau .twbx workbook (calculated field XML pasted below) built on a
table at <catalog>.<schema>.<table> with columns: <list>.
Generate a Databricks Metric View YAML (version: 0.1) that:
anchored to MAX(order_date) in the source table
Tableau calculated fields:
<paste the <column> elements from the .twb XML here>
Caveat AI-generated YAML still needs review. LLMs occasionally hallucinate column names, miss edge cases in date arithmetic, or invent dimension types that do not exist in the source. Always run DESCRIBE on the source table before applying the YAML, and validate the syntax in the Metric View editor — which highlights errors in real time. |
Below is the full working YAML for the Superstore KPI workbook, validated in a live Databricks workspace (workspace.default schema, Serverless Starter Warehouse). It captures the four headline KPIs — Sales, Profit, Profit Ratio, and Customers — along with last-month / last-year variants, period-over-period deltas, year-to-date / quarter-to-date roll-ups, and target tracking.
version: 0.1
source: default.sample_superstore
# ── Dimensions ───────────────────────────────────────────────
dimensions:
- name: Order Date
expr: order_date
- name: Order Month
expr: DATE_TRUNC('MONTH', order_date)
- name: Order Quarter
expr: DATE_TRUNC('QUARTER', order_date)
- name: Order Year
expr: YEAR(order_date)
- name: Customer ID
expr: customer_id
- name: Customer Name
expr: customer_name
- name: Segment
expr: segment
- name: Region
expr: region
- name: State/Province
expr: state_province
- name: Category
expr: category
- name: Sub-Category
expr: sub_category
- name: Product Name
expr: product_name
- name: Ship Mode
expr: ship_mode
# ── Measures ─────────────────────────────────────────────────
measures:
# Core
- name: Sales
expr: SUM(sales)
- name: Profit
expr: SUM(profit)
- name: Profit Ratio
expr: SUM(profit) / NULLIF(SUM(sales), 0)
- name: Customers
expr: COUNT(DISTINCT customer_id)
- name: Orders
expr: COUNT(DISTINCT order_id)
# Last Month
- name: Sales LM
expr: |
SUM(CASE WHEN order_date >= ADD_MONTHS(DATE_TRUNC('MONTH',
(SELECT MAX(order_date) FROM default.sample_superstore)), -1)
AND order_date < DATE_TRUNC('MONTH',
(SELECT MAX(order_date) FROM default.sample_superstore))
THEN sales END)
# ... (Profit LM, Customers LM follow same pattern)
# Period-over-period
- name: Sales % Change LMLY
expr: >
(MEASURE(`Sales LM`) - MEASURE(`Sales LY`))
/ NULLIF(MEASURE(`Sales LY`), 0)
# Targets
- name: Sales Target
expr: MEASURE(`Sales LY`) * 1.10
- name: Sales Hit Target
expr: CASE WHEN MEASURE(`Sales LM`) > MEASURE(`Sales Target`) THEN 1 ELSE 0 END
- name: Profit Ratio Target
expr: 0.15
With the YAML written, the next step is to register it as a first-class object in Unity Catalog. Databricks offers two paths: an interactive UI for one-off creation, and a programmatic path for CI/CD or repository-driven workflows.
In the Databricks workspace, navigate to Catalog → select your target catalog and schema (here, workspace.default) → Create → Metric View. You will be prompted for a name and location:
Figure 1 — Create metric view dialog: name set to sample_superstore_metric_view, location workspace.default.
After clicking Create, Databricks opens a side-by-side editor with the YAML on the right and a live UI preview / catalog tree on the left. Paste the YAML, choose a warehouse (Serverless Starter Warehouse works for development), and click Save. The editor validates syntax in real time and surfaces parse errors inline.
Figure 2 — YAML editor with the Superstore Metric View loaded. The Catalog Explorer panel on the left confirms the view registered in workspace.default; the YAML on the right shows dimensions and measures resolved against default.sample_superstore.
For teams managing metric definitions in a Git repository, Metric Views can be created and updated via the Databricks CLI or REST API. Both accept the same YAML payload, making CI/CD straightforward:
# Databricks CLI
databricks metric-views create \
--catalog workspace \
--schema default \
--name sample_superstore_metric_view \
--definition-file superstore_metrics_view.yaml
# Or use SQL DDL in a notebook / SQL editor:
CREATE OR REPLACE VIEW workspace.default.sample_superstore_metric_view
WITH METRICS
LANGUAGE YAML
AS $$
<paste YAML body here>
$$;
A registered Metric View is queryable from any SQL surface in Databricks. The important difference from a regular view is that measures must be invoked via the MEASURE() function — they are not plain columns. Dimensions are referenced by name (with backticks if they contain spaces).
-- Sales and Profit Ratio by Region, current vs last month
SELECT
`Region`,
MEASURE(`Sales`) AS sales,
MEASURE(`Sales LM`) AS sales_last_month,
MEASURE(`Sales % Change LMLY`) AS sales_pct_change,
MEASURE(`Profit Ratio`) AS profit_ratio,
MEASURE(`Profit Ratio Below Target`) AS below_target_flag
FROM workspace.default.sample_superstore_metric_view
GROUP BY `Region`
ORDER BY MEASURE(`Sales`) DESC;
Note that ORDER BY against a measure must also use the MEASURE() wrapper. Aliases declared in the SELECT cannot be referenced in ORDER BY with the measure name alone — Databricks resolves measure references through the function call.
Register the Metric View as a data source in a Genie space, and business users can ask questions in plain English:
Genie uses the declared measure definitions to construct correct SQL — including the MEASURE() wrappers — eliminating the risk of ad-hoc aggregation errors. Because measure semantics are encoded in the YAML, the answers Genie produces are consistent across users and questions.
Connect an AI/BI Dashboard directly to the Metric View. Databricks auto-suggests visualisation types based on measure characteristics and generates chart configurations from dimension and measure names. The KPI cards, sparklines, and period-comparison deltas from the original Tableau workbook can be recreated in minutes — and any change to the underlying YAML propagates to every dashboard built on it.
The Sample Superstore Excel ships with headers containing spaces, slashes, and dashes (Order Date, Sub-Category, State/Province). Convert all column names to lowercase snake_case during ingestion so every expr field can reference them without backtick escaping. A one-time rename (order_date, sub_category, state_province) permanently eliminates that friction.
The pattern (SELECT MAX(order_date) FROM source_table) inside a CASE WHEN gives you a dynamic "latest date" anchor without needing a calendar table or a parameter. This is the Databricks equivalent of Tableau's {MAX([Order Date])} LOD expression — and unlike MAX(...) OVER (), it nests cleanly inside aggregations.
For shifting dates by N months, always use ADD_MONTHS(date, n) — not DATE_ADD(date, n * 30). The latter breaks on months with different day counts and produces wrong results at month boundaries. ADD_MONTHS handles leap years and 28/29/30/31-day months correctly.
Inside both YAML expressions and SQL queries, measure names containing spaces require backticks: MEASURE(`Sales LM`), MEASURE(`Profit Ratio`). Missing backticks produce a parse error that is easy to miss in a long YAML. Use them consistently even for single-word names to avoid the inconsistency.
Tableau's table calculations (RUNNING_SUM, LOOKUP, RANK) are computed after aggregation, row by row within a visualisation partition. They have no direct Metric View equivalent. Running totals can be approximated with window functions (SUM(...) OVER (ORDER BY ...)), but these behave differently when filters or groupings change. Flag these fields for manual review during migration — they are the most likely source of result discrepancies between the Tableau original and the Databricks port.
Before declaring the migration done, pick three or four measures (Sales YTD, Profit Ratio, % Change LMLY) and compare numbers between the Tableau workbook and the Databricks Metric View for an identical filter. A 0.1% discrepancy usually points to a date-boundary edge case; a >5% discrepancy usually points to a translation error in an LOD or table calculation.
Tableau workbooks are a rich source of institutional knowledge about how a business wants its metrics defined. The calculated fields, date logic, and target thresholds inside a .twbx file represent years of alignment between data and business teams.
Databricks Metric Views offer a path to preserve that knowledge in an open, lakehouse-native format — one that is queryable by SQL, understandable by Genie, and reusable across every downstream tool connected to Unity Catalog. The migration process follows a clear pattern: extract calculated fields from the .twb XML, translate Tableau LOD expressions into scalar subqueries, map dimension and measure roles to YAML structure, and normalise source column names. AI coding assistants compress what used to be a multi-week project into a single afternoon of focused review.
The result is a governed semantic layer that any tool or AI agent in the Databricks ecosystem can consume consistently — and that survives the inevitable next BI tool migration without rebuilding business logic from scratch.
Resources Sample Superstore data: https://public.tableau.com/app/learn/sample-data KPI Designs workbook (Stanke): https://public.tableau.com/app/profile/stanke/viz/KPIDesigns/KPIDesign Databricks AI/BI documentation: https://docs.databricks.com/en/ai-bi/index.html |
#Databricks #Metric Views #Tableau