cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

Wap pattern unity catalog

jorperort
New Contributor III

Good afternoon,

I am looking for documentation to implement the WAP pattern using Unity Catalog, workflows, SQL notebooks, and any other services necessary to use this pattern. Could you share information on how to approach the problem with documentation, a practical case, or any other pattern or best practices I should consider? Perhaps separating a staging schema, for example.

Best regards, and thank you in advance for your help.

1 ACCEPTED SOLUTION

Accepted Solutions

VZLA
Databricks Employee
Databricks Employee

@jorperort thanks for your question!

To implement the Write-Audit-Publish (WAP) pattern in Databricks using Unity Catalog, workflows, and SQL notebooks, follow these steps:

  1. Set Up Unity Catalog: Configure Unity Catalog for unified governance across data assets.
  2. Create Schemas: Use separate schemas for staging and production to manage data lifecycle:
    • CREATE SCHEMA IF NOT EXISTS staging;
    • CREATE SCHEMA IF NOT EXISTS production;
  3. Develop SQL Notebooks: Write SQL notebooks for ingestion, validation, and transformation tasks:
    • Ingest data into staging.
    • Validate and transform the data.
    • Publish to production.
  4. Automate with Workflows: Set up Databricks Workflows to automate notebook execution in sequence: ingest, validate, transform, and publish.
  5. Follow WAP Steps:
    • Write: Load raw data into the staging schema.
    • Audit: Validate and transform data within staging.
    • Publish: Move validated data to production.

View solution in original post

2 REPLIES 2

VZLA
Databricks Employee
Databricks Employee

@jorperort thanks for your question!

To implement the Write-Audit-Publish (WAP) pattern in Databricks using Unity Catalog, workflows, and SQL notebooks, follow these steps:

  1. Set Up Unity Catalog: Configure Unity Catalog for unified governance across data assets.
  2. Create Schemas: Use separate schemas for staging and production to manage data lifecycle:
    • CREATE SCHEMA IF NOT EXISTS staging;
    • CREATE SCHEMA IF NOT EXISTS production;
  3. Develop SQL Notebooks: Write SQL notebooks for ingestion, validation, and transformation tasks:
    • Ingest data into staging.
    • Validate and transform the data.
    • Publish to production.
  4. Automate with Workflows: Set up Databricks Workflows to automate notebook execution in sequence: ingest, validate, transform, and publish.
  5. Follow WAP Steps:
    • Write: Load raw data into the staging schema.
    • Audit: Validate and transform data within staging.
    • Publish: Move validated data to production.

szymon_dybczak
Esteemed Contributor III

Hi @jorperort ,

Apart from nice step by step instruction that @VZLA has provided, you can also take a look at short presentation of WAP pattern at the official databricks YT channel:

https://youtu.be/4K3zAmUgViE?t=492

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you wonโ€™t want to miss the chance to attend and share knowledge.

If there isnโ€™t a group near you, start one and help create a community that brings people together.

Request a New Group