cancel
Showing results for 
Search instead for 
Did you mean: 
Data Governance
Join discussions on data governance practices, compliance, and security within the Databricks Community. Exchange strategies and insights to ensure data integrity and regulatory compliance.
cancel
Showing results for 
Search instead for 
Did you mean: 

Structured Streaming using an ADF (Azure Data Factory) pipeline issue on RLS enabled table

karunakaran_r
New Contributor III

We are in the process of implementing Row-Level Security (RLS) on a table in Databricks. As per our architecture, data ingestion is handled via Structured Streaming using an ADF (Azure Data Factory) pipeline.

However, we are encountering the following error during ingestion:

sql
Copy
Edit
pyspark.errors.exceptions.connect.AnalysisException:
[RequestId=a1541086-31ad-48e3-8781-3caefaae2c63
ErrorClass=INVALID_PARAMETER_VALUE.PATH_BASED_ACCESS_NOT_SUPPORTED_FOR_TABLES_WITH_ROW_COLUMN_ACCESS_POLICIES]
Path-based access to table ********* with row filter or column mask not supported.
We’ve verified that we are using the full Unity Catalog-qualified table path for both reading from and writing to the table. However, the checkpoint location is currently specified as a path (e.g., abfss://...).

Could this path-based checkpointing be the root cause of the issue? If so, what is the recommended approach to ingest data using Structured Streaming into an RLS-enabled table while complying with Unity Catalog constraints?

We would appreciate guidance on how to properly configure the checkpointing or ingestion process in this context.

 

streaming_query = df\
.withColumn(BUSINESS_UNIT_COLUMN, lit(business_unit))\
.withColumn(SEGMENT_NAME_COLUMN, lit(segment_name))\
.withColumn(SOURCE_SYSTEM_NAME_COLUMN, lit(source_system_name))\
.writeStream \
.foreachBatch(process_batch) \
.outputMode("append")\
.option("checkpointLocation", checkpoint_file_path_gold)\
.option("skipChangeCommits", "true")\
.trigger(availableNow=True)\
.start()

 

2 REPLIES 2

mani_22
Databricks Employee
Databricks Employee

@karunakaran_r Are you trying to read or write to a table with RLS/CM enabled using a file path directly instead of specifying the table name?

You cannot specify the table path for reading and writing tables with RLS/CM enabled. You would have to read the table by its catalog and schema.

Eg:

=> Use spark.read.table("my_catalog.my_schema.my_secure_table") instead of spark.read.format("delta").load(“abfss:/my-secure-table-path")

 

@mani_22 , Thank you for your response, We are using unity catalog path to read or write table and we were able to idenitfy the root cause and fixed it. We had a check point location created inside the table location where RLS is applied , it causing a path based access issues, it started working after we moved the check point outside the table location. Thanks again. 

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now