-
Databricks Asset Bundles and CI/CD:
- Databricks Asset Bundles allow you to package and deploy Databricks assets (such as notebooks, libraries, and jobs) in a structured manner.
- They are useful for automating and customizing CI/CD workflows within your GitHub repositories using GitHub Actions and Databricks CLI.
- You can define bundle configurations in YAML files to manage your assets.
-
Unity Catalog Support for DLT Pipelines:
- Unity Catalog is a powerful feature that allows you to define and manage tables, views, and materialized views within Databricks.
- With Unity Catalog, you can:
- Define a catalog where your pipeline will persist tables.
- Read data from Unity Catalog tables.
- Query tables created by pipelines using both Python and SQL interfaces.
- Use shared Unity Catalog clusters with Databricks Runtime 13.1 and above or a SQL warehouse.
- However, there are some limitations:
- Existing pipelines that use the Hive metastore cannot be upgraded to use Unity Catalog.
- A single pipeline cannot write to both the Hive metastore and Unity Catalog.
- Existing pipelines not using Unity Catalog remain unaffected and continue to persist data to the Hive metastore.
-
Specifying Catalog and Schema:
- To create tables in Unity Catalog from a DLT pipeline, you need:
USE CATALOG
privileges on the target catalog.
CREATE MATERIALIZED VIEW
and USE SCHEMA
privileges in the target schema (if your pipeline creates materialized views).
CREATE TABLE
and USE SCHEMA
privileges (if your pipeline creates streaming tables).
- If no target schema is specified in the pipeline settings, you need privileges on at least one schema in the target catalog.
-
Future Plans:
- While there are no specific details about plans to directly specify Unity Catalog and target schema in the YAML file for DLT pipelines, Databricks continually enhances its features.
- Itโs possible that future updates may address this limitation, but as of now, itโs not natively supported.
In summary, while Unity Catalog is a powerful tool for managing metadata and tables, directly specifying it in the DLT pipeline YAML file is not currently feasible. Keep an eye on Databricks updates for any enhancements in this area! ๐๐
For more information, you can refer to the official Databricks documentation on Unity Catalog with Delta Live Tables.1