cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Write on External Table with Row Level Security fails

lmu
New Contributor III

Hey,

we are experiencing issues with writing to external tables when using the Unity Catalogue and Row Level Security.

As soon as we stop using the serverless compute instance, we receive the following error for writing (Overwrite, append and upsert):
ErrorClass=INVALID_PARAMETER_VALUE.PATH_BASED_ACCESS_NOT_SUPPORTED_FOR_TABLES_WITH_ROW_COLUMN_ACCESS_POLICIES] Path-based access to table catalog_x.schema_y.rls_dbx_test with row filter or column mask not supported.

RLS and write to external table:

  • works with a serverless cluster
  • does not work with LTS 13.4
    • both access mode tested
  • does not work with LTS 15.4 
    • both access mode tested
  • does not work with LTS 16.4 

 

 

Is there any way to make the RLS with an external table and a dedicated cluster work?

Using dynamic views as an alternative introduces a lot more work and user complexity.



This is our test setup:

CREATE OR REPLACE FUNCTION us_filter(rls_entity STRING)
RETURN IF(CURRENT_USER()=='user_x', true, rls_entity='us');

ALTER TABLE catalog_x.schema_y.rls_dbx_test SET ROW FILTER us_filter ON (rls_entity);
from pyspark.sql import SparkSession
from pyspark.sql.types import StructType, StructField, StringType, IntegerType
from delta.tables import DeltaTable

# Define catalog and schema
catalog = "catalog_x"
schema_name = "schema_y"
table_name = "rls_dbx_test"

### CASE OVERWRITE ###
# Sample data
data = [
    ("logger_01", 1909, "us"),
    ("logger_02", 1909, "row"),
    ("logger_03", 1909, "us"),
    ("logger_04", 1909, "bra")
]

# Define schema
schema = StructType([
    StructField("data_object_name", StringType(), True),
    StructField("year", IntegerType(), True),
    StructField("rls_entity", StringType(), True)
])

# Create DataFrame
df = spark.createDataFrame(data, schema)

# Set current catalog and schema
spark.sql(f"USE CATALOG {catalog}")
spark.sql(f"CREATE SCHEMA IF NOT EXISTS {schema_name}")
spark.sql(f"USE {schema_name}")

spark.sql(f"""
      CREATE TABLE IF NOT EXISTS `{catalog}`.`{schema_name}`.`{table_name}`
      USING DELTA
      LOCATION '{"abfss://placeholder_x@placeholder_y}'
""")

# Write as Delta table
df.write.format("delta").mode("overwrite").option("mergeSchema", "true").saveAsTable(f"{catalog}.{schema_name}.{table_name}")

### CASE APPEND ###
data = [
    ("logger_05", 1909, "us"),
    ("logger_06", 1909, "row"),    ]

# Define schema
schema = StructType([
    StructField("data_object_name", StringType(), True),
    StructField("year", IntegerType(), True),
    StructField("rls_entity", StringType(), True)
])

# Create DataFrame
df = spark.createDataFrame(data, schema)

# Write as Delta table
df.write.format("delta").mode("append").saveAsTable(f"{catalog}.{schema_name}.{table_name}")

### CASE UPSERT ###
data = [
    ("logger_04", 1909, "ch"),
    ("logger_07", 1909, "us"),
    ("logger_08", 1909, "eu"),    ]

# Define schema
schema = StructType([
    StructField("data_object_name", StringType(), True),
    StructField("year", IntegerType(), True),
    StructField("rls_entity", StringType(), True)
])

# Create DataFrame
df = spark.createDataFrame(data, schema)

full_table_name = f"{catalog}.{schema_name}.{table_name}"

delta_table = DeltaTable.forName(spark, full_table_name)
delta_table.alias("target").merge(
        source=df.alias("source"),
        condition="target.data_object_name == source.data_object_name"
    ).whenMatchedUpdateAll() \
     .whenNotMatchedInsertAll() \
     .execute()

Thanks

1 ACCEPTED SOLUTION

Accepted Solutions

lmu
New Contributor III

After further testing, it was found that the dedicated access mode (formerly single user) either does not work or exhibits strange behaviour. In one scenario, the 16.4 cluster with dedicated access mode could write in append mode but not overwrite, and upsert led to nulls in the test table. This was very strange.
However, it worked with the shared access mode.

Working solution:

  • 15.4 dedicated cluster with standard access mode (formerly shared)
  • 16.4 dedicated cluster with standard access mode (formerly shared)

Not working:

  • 15.4 with access mode dedicated(single user) cluster
  • 16.4 withaccess mode dedicated(single user) cluster
  • 13.4 dedicated or shared

View solution in original post

11 REPLIES 11

Saritha_S
Databricks Employee
Databricks Employee

Hi @lmu 

Good day!!

Error:

Exception:ErrorClass=INVALID_PARAMETER_VALUE.PATH_BASED_ACCESS_NOT_SUPPORTED_FOR_TABLES_WITH_ROW_COLUMN_ACCESS_POLICIES] Path-based access to table catalog_x.schema_y.rls_dbx_test with row filter or column mask not supported.

Root Cause Analysis:

I would like to inform you that path-based access to files in tables with policies(Row Filter/Column Masking) is not supported.

(https://docs.databricks.com/en/tables/row-and-column-filters.html#limitations:~:text=Path%2Dbased%20...)

Solution:

As they are accessing the table using the path, the correct way will be to only use the table name if they want to continue using the new table policies.

lmu
New Contributor III

Hey @Saritha_S 
thanks for the quick response.

So from my understanding we are not using a path like write as we use saveAsTable().
But anyways, what would be the solution to write to an external table with a rls and a dedicated server?
Could you send us a small example?

Here it says external tables are supported "Delta and Parquet for managed and external tables." (https://learn.microsoft.com/en-us/azure/databricks/tables/row-and-column-filters#support)

Thanks a lot!

Cheers Lukas

EktaPuri
New Contributor III

Try creating view with rls policies and provide access to view rather table it would be simple

lmu
New Contributor III

Hey @EktaPuri 
as mentioned in the original post this is not optimal for our setup.
"Using dynamic views as an alternative introduces a lot more work and user complexity."

It would be great if there were a way to write to external tables with RLS and a dedicated server.

EktaPuri
New Contributor III

Okay which cluster you are trying this to run, because even if we use dedicated cluster it passed to serverless for data filtering.

EktaPuri_0-1749714127759.png

 

lmu
New Contributor III

Hey @EktaPuri  as stated in my original post:
"RLS and write to external table:

  • works with a serverless cluster
  • does not work with LTS 13.4
    • both access mode tested
  • does not work with LTS 15.4 
    • both access mode tested
  • does not work with LTS 16.4 "

EktaPuri
New Contributor III

One thing I observe  that you have created your function without catalog name and schema name , can you run show function command it must be getting registered in hive_metastore that can be the reason, because I tried it my case it's working 

Commands : show function

Then create function catalog_name.schema_name.function_name

I am using standard cluster on 15.4 Lts

lmu
New Contributor III

We are using the Unity Catalog.
Function is registered in the correct catalog and on the table:

lmu_0-1749715266326.png

 

lmu
New Contributor III

@Saritha_S 

Is there any chance of writing to an external table with an RLS and a dedicated server?
Could you send us a small example?

As mentioned above, the documentation describes this as a functioning feature:
"Delta and Parquet for managed and external tables." (https://learn.microsoft.com/en-us/azure/databricks/tables/row-and-column-filters#support)

LonaOsmani
New Contributor II

Hi @lmu ,

To use dedicated compute to write to a table with row level security:
- Serverless must be enabled in your workspace (because the dedicated compute passes the query to serverless to perform the data filtering)
- The dedicated compute must be on Databricks Runtime 16.3 or above

(https://learn.microsoft.com/en-us/azure/databricks/compute/single-user-fgac#support-for-write-operat... )


I think the error message you are encountering, PATH_BASED_ACCESS_NOT_SUPPORTED_FOR_TABLES_WITH_ROW_COLUMN_ACCESS_POLICIES, comes from this statement 'CREATE TABLE...'. The solution would be to first check if the table exists using system tables.

lmu
New Contributor III

After further testing, it was found that the dedicated access mode (formerly single user) either does not work or exhibits strange behaviour. In one scenario, the 16.4 cluster with dedicated access mode could write in append mode but not overwrite, and upsert led to nulls in the test table. This was very strange.
However, it worked with the shared access mode.

Working solution:

  • 15.4 dedicated cluster with standard access mode (formerly shared)
  • 16.4 dedicated cluster with standard access mode (formerly shared)

Not working:

  • 15.4 with access mode dedicated(single user) cluster
  • 16.4 withaccess mode dedicated(single user) cluster
  • 13.4 dedicated or shared

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now