Facing issue when doing merge of dataframe to delta table which has mask applied on two of the columns.
Code
DeltaTable.forName(sparkSession=spark,tableOrViewName=f'{catalog}.{schema}.{table_name}').alias('target').merge(
new_df.alias('updates'),
'updates.customerID = target.customerID'
).whenNotMatchedInsertAll().execute()
Error
[MISSING_ATTRIBUTES.RESOLVED_ATTRIBUTE_APPEAR_IN_OPERATION] Resolved attribute(s) "first_name" missing from "customerID", "first_name", "last_name", "city", "state" in operator !Project [customerID#153L, redact AS first_name#178, redact AS last_name#179, redact AS city#180, redact AS state#181]. Attribute(s) with the same name appear in the operation: "first_name".
Please check if the right attribute(s) are used. SQLSTATE: XX000
File <command-6480109877101749>, line 4
1 DeltaTable.forName(sparkSession=spark,tableOrViewName=f'{catalog}.{schema}.{table_name}').alias('target').merge(
2 new_df.alias('updates'),
3 'updates.customerID = target.customerID'
----> 4 ).whenNotMatchedInsertAll().execute()
File /databricks/spark/python/pyspark/sql/connect/client/core.py:2377, in SparkConnectClient._handle_rpc_error(self, rpc_error)
2363 raise SparkConnectGrpcException(
2364 "Python versions in the Spark Connect client and server are different. "
2365 "To execute user-defined functions, client and server should have the "
(...)
2373 "sqlState", default=SparkConnectGrpcException.CLIENT_UNEXPECTED_MISSING_SQL_STATE),
2374 ) from None
2375 # END-EDGE
-> 2377 raise convert_exception(
2378 info,
2379 status.message,
2380 self._fetch_enriched_error(info),
2381 self._display_server_stack_trace(),
2382 ) from None
2384 raise SparkConnectGrpcException(
2385 message=status.message,
2386 sql_state=SparkConnectGrpcException.CLIENT_UNEXPECTED_MISSING_SQL_STATE, # EDGE
2387 ) from None
2388 else:
Merge works fine with spark.sql or with %sql but have issue with python syntax.
If the column mask is removed from the table, it works fine.