For your consideration:
To migrate legacy dashboards from the Hive Metastore to Unity Catalog in Databricks programmatically while retaining SQL queries, data visualizations, and ensuring compatibility with Unity Catalog schemas and tables using Python, here are some approaches and best practices based on the search results:
-
Migration of SQL Queries:
- You can migrate SQL queries by rewriting them to use Unity Catalog's three-level namespace (
catalog.schema.table
), which is required for Unity Catalog compatibility.
- If your SQL queries use tables from the Hive Metastore, these tables should first be migrated to Unity Catalog using utilities like
CREATE TABLE AS SELECT
or CREATE TABLE CLONE
(preferably deep clones for Delta tables, which preserve Delta logs).
-
Migrating Data Visualizations:
- Data visualizations tied to queries in the legacy dashboards can be recreated by cloning the dashboards. Databricks provides tools for cloning legacy dashboards to Lakeview dashboards, which offers expanded API support for creating, getting, and updating dashboards programmatically.
- Queries and visualizations can be copied to create new dashboards from the SQL editor, ensuring minimal modification during migration.
-
Compatibility with Unity Catalog Schemas/Tables:
- Make sure the schema and table definitions are adjusted for Unity Catalog. Unity Catalog supports managed tables in Delta format as well as external tables.
- Use the
SYNC
SQL command or Hive Metastore federation to migrate external tables or enable a hybrid workspace for incremental migration.
- For external locations, ensure adequate setup of Unity Catalog-managed access controls and storage credentials.
-
Python Automation:
- The Databricks Python SDK can be utilized for orchestrating migration processes. You can identify and migrate tables programmatically by leveraging SDK utilities (
databricks.sdk.service.catalog
) or issuing SQL commands programmatically via Databricks compute resources.
- Databricks recommends using deep clone operations for managed Delta tables or the
CREATE TABLE AS SELECT
approach for broader migration options.
- Dashboards can be manipulated programmatically using Databricks SQL Queries and Dashboard APIs.
-
Additional Considerations for Dashboard Migration:
- If dropdown list parameters or query-backed parameters exist in legacy dashboards, these should be migrated to comply with Unity Catalog standards.
- Legacy dashboards can be cloned to Lakeview dashboards programmatically, making it easier to move previously embedded SQL queries and visualizations.
-
Recommended Tools and Resources:
- Enable Hive Metastore federation to maintain access to Hive tables during active migration processes. Federation simplifies incremental migration by allowing query execution from both Hive Metastore and Unity Catalog during the transition.
- Prioritize migration tools like the UCX toolkit for managing schema and table upgrades and handling both managed and external Unity Catalog tables.
Challenges or additional steps may apply depending on the specifics of your dashboards or infrastructure. For a seamless migration, carefully assess data dependencies, storage formats, and any legacy constructs that may require adaptation to comply with Unity Catalog standards. Always test the migrated dashboards and data pipelines in a staging environment before final deployment.