We are in the process of implementing Row-Level Security (RLS) on a table in Databricks. As per our architecture, data ingestion is handled via Structured Streaming using an ADF (Azure Data Factory) pipeline.
However, we are encountering the following error during ingestion:
sql
Copy
Edit
pyspark.errors.exceptions.connect.AnalysisException:
[RequestId=a1541086-31ad-48e3-8781-3caefaae2c63
ErrorClass=INVALID_PARAMETER_VALUE.PATH_BASED_ACCESS_NOT_SUPPORTED_FOR_TABLES_WITH_ROW_COLUMN_ACCESS_POLICIES]
Path-based access to table ********* with row filter or column mask not supported.
We’ve verified that we are using the full Unity Catalog-qualified table path for both reading from and writing to the table. However, the checkpoint location is currently specified as a path (e.g., abfss://...).
Could this path-based checkpointing be the root cause of the issue? If so, what is the recommended approach to ingest data using Structured Streaming into an RLS-enabled table while complying with Unity Catalog constraints?
We would appreciate guidance on how to properly configure the checkpointing or ingestion process in this context.
streaming_query = df\
.withColumn(BUSINESS_UNIT_COLUMN, lit(business_unit))\
.withColumn(SEGMENT_NAME_COLUMN, lit(segment_name))\
.withColumn(SOURCE_SYSTEM_NAME_COLUMN, lit(source_system_name))\
.writeStream \
.foreachBatch(process_batch) \
.outputMode("append")\
.option("checkpointLocation", checkpoint_file_path_gold)\
.option("skipChangeCommits", "true")\
.trigger(availableNow=True)\
.start()