Databricks Delta MERGE fails with row filter — “Cannot find column index for attribute 'account_id'”

vim17
New Contributor II

Problem:
I’m getting the below error when performing a MERGE (or any other DML command) on a Delta table with a row filter in Databricks.
Error: Cannot find column index for attribute 'account_id#48219' in: Map(transaction_id#47260 -> 5, file_path#47253 -> 0, numPhysicalRecords#49686L -> 3, defaultRowCommitVersion#47262L -> 7, version#47265L -> 9, size#47255L -> 1, account_number#48983 -> 10, baseRowId#47261L -> 6, transaction_tags#47259 -> 4, modificationTime#47256L -> 2, partitionValues_parsed#47264 -> 😎

I’ve verified that account_id is present in the table. For some reason, it is not able to access it in both cases where a row filter allows/blocks data from the user.


Details:
I’ve created a minimal repro that can demonstrate the issue, please find it below.
merge operation:
target_txns.alias("target") \
.merge(source_txns.alias("source"), merge_cond) \
.whenMatchedUpdateAll() \
.whenNotMatchedInsertAll() \
.execute()

The target table has this row filter:
CREATE FUNCTION fin_data.customer_db.filter_account(acc_id STRING) RETURNS BOOLEAN
RETURN acc_id IN (
SELECT account_id
FROM fin_data.customer_db.authorized_accounts
WHERE email = current_user()
);

facing this error, Any leads/help on this would be appreciated.

SP_6721
Honored Contributor II

Hi @vim17 ,

It might be due to the limitation mentioned here: Row filters and column masks
You could try simplifying the row filter function and then re-running the merge

vim17
New Contributor II

I’ve verified that row filter or column-mask policies do not contain nesting, aggregations, windows, limits, or non-deterministic functions

Amruth_Ashok
Databricks Employee
Databricks Employee

Hi @vim17, I see "partitionValues_parsed#47264" in the Error trace. Is the table partitioned, by any chance? 
Which DBR version are you using?

vim17
New Contributor II

Yes, the table is partitioned on transaction_date.

Sample target table schema:
CREATE TABLE fin_data.customer_db.account_transactions (
transaction_id STRING,
account_id STRING,
transaction_date DATE,
transaction_type STRING,
transaction_amount DECIMAL(18,2),
currency STRING,
merchant_name STRING,
merchant_category STRING,
transaction_tags ARRAY<STRING>,
modificationTime TIMESTAMP
)
USING DELTA
PARTITIONED BY (transaction_date)


Target table sample data
target_txns = spark.createDataFrame([
("txn_001", "ACC123", "2025-08-01", "DEBIT", 150.00, "USD", "Amazon", "E-commerce", ["online"], "2025-08-01 10:00:00"),
("txn_002", "ACC124", "2025-08-02", "CREDIT", 2000.00, "USD", "Payroll", "Salary", ["monthly"], "2025-08-02 09:00:00")
], ["transaction_id", "account_id", "transaction_date", "transaction_type", "transaction_amount", "currency", "merchant_name", "merchant_category", "transaction_tags", "modificationTime"])


Source table sample data
source_txns = spark.createDataFrame([
("txn_003", "ACC123", "2025-08-03", "DEBIT", 75.00, "USD", "Starbucks", "Cafe", ["food"], "2025-08-03 08:00:00"),
("txn_004", "ACC125", "2025-08-04", "DEBIT", 300.00, "USD", "Best ", "Electronics", ["gadget"], "2025-08-04 15:00:00")
], ["transaction_id", "account_id", "transaction_date", "transaction_type", "transaction_amount", "currency", "merchant_name", "merchant_category", "transaction_tags", "modificationTime"])
Currently, we are using 15.4, 16.4 DBR