cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

ABAC tag support for for Streaming tables (Spark Lakeflow Declarative Pipelines)?

excavator-matt
Contributor

Hi!

We're using Spark Lakeflow Declarative Pipelines for ingesting data from various data sources. However, in order to achieve compliance with GDPR, we are planning to start using ABAC tagging.

However, I don't understand how we are supposed to use this implement this on streaming tables with version control. The documentation is doesn't mention it.

I think you can tag the tables manually, but that might risk getting lost as tables are recreated. I also consider simply not allowing free access in bronze and apply tagging in version controlled models, but that seems harsh.

Have I missed something or is the support lacking here?

1 REPLY 1

ManojkMohan
Honored Contributor II

@excavator-matt 

“Can we tag streaming tables with ABAC and expect it to be safe across versions?”

  • Yes, streaming tables are fully subject to UC ABAC, but if the table is physically recreated, table‑level tags can be lost

“Is there first‑class support in Lakeflow for this?”

“What is a sane pattern?”

  • Use Unity Catalog everywhere for Lakeflow
  • Put GDPR‑relevant tags and ABAC policies at catalog leve
  • , manage table/column tags via IaC or deployment jobs that re‑apply tags after pipeline changes,

ABAC policies work on Unity Catalog tables
But Lakeflow Declarative Pipelines do not version governed tags for you, so you must manage tags and policies at the UC layer

Solution thinking:

  1. For GDPR‑style constraints  attach governed tags at catalog level 
  2. ABAC policies can key off both governed tags and other attributes like table name
  3. For streaming models that are versioned (for example, customer_latest_v1, customer_latest_v2), you can keep them in a “sensitive” schema (tagged as PII) and have policies that apply uniformly,
  4. Automate tags as part of CI/CD for Lakeflow

If you share a bit about how you currently version streaming tables (e.g., drop/recreate vs ALTER vs new names), a more pipeline‑specific tagging workflow can be solutioned