cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

ABAC tag support for for Streaming tables (Spark Lakeflow Declarative Pipelines)?

excavator-matt
Contributor

Hi!

We're using Spark Lakeflow Declarative Pipelines for ingesting data from various data sources. However, in order to achieve compliance with GDPR, we are planning to start using ABAC tagging.

However, I don't understand how we are supposed to use this implement this on streaming tables with version control. The documentation is doesn't mention it.

I think you can tag the tables manually, but that might risk getting lost as tables are recreated. I also consider simply not allowing free access in bronze and apply tagging in version controlled models, but that seems harsh.

Have I missed something or is the support lacking here?

2 REPLIES 2

ManojkMohan
Honored Contributor II

@excavator-matt 

โ€œCan we tag streaming tables with ABAC and expect it to be safe across versions?โ€

  • Yes, streaming tables are fully subject to UC ABAC, but if the table is physically recreated, tableโ€‘level tags can be lost

โ€œIs there firstโ€‘class support in Lakeflow for this?โ€

โ€œWhat is a sane pattern?โ€

  • Use Unity Catalog everywhere for Lakeflow
  • Put GDPRโ€‘relevant tags and ABAC policies at catalog leve
  • , manage table/column tags via IaC or deployment jobs that reโ€‘apply tags after pipeline changes,

ABAC policies work on Unity Catalog tables
But Lakeflow Declarative Pipelines do not version governed tags for you, so you must manage tags and policies at the UC layer

Solution thinking:

  1. For GDPRโ€‘style constraints  attach governed tags at catalog level 
  2. ABAC policies can key off both governed tags and other attributes like table name
  3. For streaming models that are versioned (for example, customer_latest_v1, customer_latest_v2), you can keep them in a โ€œsensitiveโ€ schema (tagged as PII) and have policies that apply uniformly,
  4. Automate tags as part of CI/CD for Lakeflow

If you share a bit about how you currently version streaming tables (e.g., drop/recreate vs ALTER vs new names), a more pipelineโ€‘specific tagging workflow can be solutioned

excavator-matt
Contributor

One way to get version control might be to use the Terraform resource entity_tag_assignment. I am not sure if it supports governed_tags, but I'll experiment in the coming weeks.

This separates the version control on where the tags are defined and where the Declarative Pipeline code, but at least it is version control and you don't have to write and maintain something yourself.