cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Databricks Free Trial Help
Engage in discussions about the Databricks Free Trial within the Databricks Community. Share insights, tips, and best practices for getting started, troubleshooting issues, and maximizing the value of your trial experience to explore Databricks' capabilities effectively.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

Merge issue with column mask delta tables

arshadnehal
New Contributor II

Facing issue when doing merge of dataframe to delta table which has mask applied on two of the columns.

Code 

DeltaTable.forName(sparkSession=spark,tableOrViewName=f'{catalog}.{schema}.{table_name}').alias('target').merge(
    new_df.alias('updates'),
    'updates.customerID = target.customerID'
).whenNotMatchedInsertAll().execute()

Error

[MISSING_ATTRIBUTES.RESOLVED_ATTRIBUTE_APPEAR_IN_OPERATION] Resolved attribute(s) "first_name" missing from "customerID", "first_name", "last_name", "city", "state" in operator !Project [customerID#153L, redact AS first_name#178, redact AS last_name#179, redact AS city#180, redact AS state#181]. Attribute(s) with the same name appear in the operation: "first_name".
Please check if the right attribute(s) are used. SQLSTATE: XX000
File <command-6480109877101749>, line 4
      1 DeltaTable.forName(sparkSession=spark,tableOrViewName=f'{catalog}.{schema}.{table_name}').alias('target').merge(
      2     new_df.alias('updates'),
      3     'updates.customerID = target.customerID'
----> 4 ).whenNotMatchedInsertAll().execute()
File /databricks/spark/python/pyspark/sql/connect/client/core.py:2377, in SparkConnectClient._handle_rpc_error(self, rpc_error)
   2363                 raise SparkConnectGrpcException(
   2364                     "Python versions in the Spark Connect client and server are different. "
   2365                     "To execute user-defined functions, client and server should have the "
   (...)
   2373                         "sqlState", default=SparkConnectGrpcException.CLIENT_UNEXPECTED_MISSING_SQL_STATE),
   2374                 ) from None
   2375             # END-EDGE
-> 2377             raise convert_exception(
   2378                 info,
   2379                 status.message,
   2380                 self._fetch_enriched_error(info),
   2381                 self._display_server_stack_trace(),
   2382             ) from None
   2384     raise SparkConnectGrpcException(
   2385         message=status.message,
   2386         sql_state=SparkConnectGrpcException.CLIENT_UNEXPECTED_MISSING_SQL_STATE,  # EDGE
   2387     ) from None
   2388 else:

Merge works fine with spark.sql or with %sql but have issue with python syntax.

If the column mask is removed from the table, it works fine.

2 REPLIES 2

BS_THE_ANALYST
Esteemed Contributor

@arshadnehal are you saying that you're not satisfied with the SQL solution and you're seeking the python equivalent?

Seems like an interesting problem!

All the best,
BS

Pat
Esteemed Contributor

It looks like Delta Lake APIs (i.e. DeltaTable... ) are not supported with Row filters and column masks.

Please see limitations: https://docs.databricks.com/aws/en/tables/row-and-column-filters#limitations