Databricks Community

jorperort · ‎12-30-2024

Good afternoon,

I am looking for documentation to implement the WAP pattern using Unity Catalog, workflows, SQL notebooks, and any other services necessary to use this pattern. Could you share information on how to approach the problem with documentation, a practical case, or any other pattern or best practices I should consider? Perhaps separating a staging schema, for example.

Best regards, and thank you in advance for your help.

VZLA · ‎12-30-2024

@jorperort thanks for your question!

To implement the Write-Audit-Publish (WAP) pattern in Databricks using Unity Catalog, workflows, and SQL notebooks, follow these steps:

Set Up Unity Catalog: Configure Unity Catalog for unified governance across data assets.
Create Schemas: Use separate schemas for staging and production to manage data lifecycle:
- CREATE SCHEMA IF NOT EXISTS staging;
- CREATE SCHEMA IF NOT EXISTS production;
Develop SQL Notebooks: Write SQL notebooks for ingestion, validation, and transformation tasks:
- Ingest data into staging.
- Validate and transform the data.
- Publish to production.
Automate with Workflows: Set up Databricks Workflows to automate notebook execution in sequence: ingest, validate, transform, and publish.
Follow WAP Steps:
- Write: Load raw data into the staging schema.
- Audit: Validate and transform data within staging.
- Publish: Move validated data to production.

View solution in original post

VZLA · ‎12-30-2024

@jorperort thanks for your question!

To implement the Write-Audit-Publish (WAP) pattern in Databricks using Unity Catalog, workflows, and SQL notebooks, follow these steps:

Set Up Unity Catalog: Configure Unity Catalog for unified governance across data assets.
Create Schemas: Use separate schemas for staging and production to manage data lifecycle:
- CREATE SCHEMA IF NOT EXISTS staging;
- CREATE SCHEMA IF NOT EXISTS production;
Develop SQL Notebooks: Write SQL notebooks for ingestion, validation, and transformation tasks:
- Ingest data into staging.
- Validate and transform the data.
- Publish to production.
Automate with Workflows: Set up Databricks Workflows to automate notebook execution in sequence: ingest, validate, transform, and publish.
Follow WAP Steps:
- Write: Load raw data into the staging schema.
- Audit: Validate and transform data within staging.
- Publish: Move validated data to production.

szymon_dybczak · ‎12-30-2024

Hi @jorperort ,

Apart from nice step by step instruction that @VZLA has provided, you can also take a look at short presentation of WAP pattern at the official databricks YT channel:

https://youtu.be/4K3zAmUgViE?t=492

Databricks Community

Wap pattern unity catalog

Join Us as a Local Community Builder!

Find Sensitive Data at Scale with Data Classification in Unity Catalog

Solution Accelerator Series | #6 - Adverse Drug Event Detection

Announcing Backfill Runs in Lakeflow Jobs for Higher Quality Downstream Data

🚀 New: Databricks Interactive Architecture Design Workshops

Databricks DevConnect I Washington D.C.