cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

Error during merge operation: 'NoneType' object has no attribute 'collect'

GS_S
Visitor

Why does merge.collect() not return results in access mode: SINGLE_USER, but it does in USER_ISOLATION? I need to log the affected rows (inserted and updated) and canโ€™t find a simple way to get this data in SINGLE_USER mode. Is there a solution or an alternative method to retrieve this information?

#merge

#SINGLE_USER

#USER_ISOLATION

 

 

1 ACCEPTED SOLUTION

Accepted Solutions

Walter_C
Databricks Employee
Databricks Employee

15.4 does not directly required the serverless but for fine-grained it indeed requires it to run it on Single User as mentioned 

This data filtering is performed behind the scenes using serverless compute.


In terms of costs:
Customers are charged for the serverless compute resources that are used to perform data filtering operations. For pricing information, see Platform Tiers and Add-Ons.

If you dont want to enable Serverless on this case you will need to continue using shared access mode, which the main implication is that multiple users will be able to use this cluster if they have permissions to.

View solution in original post

7 REPLIES 7

Walter_C
Databricks Employee
Databricks Employee

Can you share the complete error message you are receiving? Also share more details around the cluster configuration you are currently using when running single user cluster?

I have a piece of code that performs a merge operation followed by merge_result.collect(). This code is executed in two different scenarios:
Through Databricks Jobs with USER_ISOLATION access mode.
In this case, merge_result.collect() works correctly and returns the expected result.
Example output: Row(num_affected_rows=219921, num_updated_rows=0, num_deleted_rows=0, num_inserted_rows=219921)

Through Databricks Jobs with SINGLE_USER access mode.
In this case, merge_result.collect() returns None, causing the following error:
AttributeError: 'NoneType' object has no attribute 'collect'

The same code is deployed in both scenarios via GitHub Actions using Databricks Bundles.

Environment Details
Databricks Runtime: 14.3

Cluster Access Modes:

SINGLE_USER (causes the issue)
USER_ISOLATION (works correctly)


The merge operation code is as follows:

merge_result = (
target_df.alias("target")
.merge(source_df.alias("source"), merge_condition)
.whenMatchedUpdate(set=update_columns)
.whenNotMatchedInsert(values=insert_columns)
.execute()
)
row = merge_result.collect()[0]

How do the access modes (USER_ISOLATION vs. SINGLE_USER) impact the execution of merge operations?
What alternatives to merge_result.collect() exist for retrieving merge results that would work in SINGLE_USER mode?

Are there any recommended practices or patterns to ensure the merge operation and result retrieval work correctly in this mode?

Walter_C
Databricks Employee
Databricks Employee

In SINGLE_USER mode, there are limitations, including restrictions on accessing certain tables and views, especially those with fine-grained access controls like row filters or column masks. This mode is designed to ensure that only the user who owns the cluster can access the data, which can lead to issues when trying to collect results from operations like merge.

Given the limitations of SINGLE_USER mode, could you suggest any alternative approach or workaround to collect the affected rows (inserted and updated) in this mode? Is there a way to enable access or adjust settings to allow this in SINGLE_USER mode? Additionally, could you clarify the potential risks or drawbacks of using USER_ISOLATION mode for this purpose?

Walter_C
Databricks Employee
Databricks Employee

Can you test DBR 15.4 LTS, seems that this DBR version and above supports fine-grained access control on single user compute, which may resolve your issue

Doesn't DBR 15.4 LTS require serverless compute to be enabled on the workspace? If so, wouldn't this lead to an increase in costs for the project?

 

Walter_C
Databricks Employee
Databricks Employee

15.4 does not directly required the serverless but for fine-grained it indeed requires it to run it on Single User as mentioned 

This data filtering is performed behind the scenes using serverless compute.


In terms of costs:
Customers are charged for the serverless compute resources that are used to perform data filtering operations. For pricing information, see Platform Tiers and Add-Ons.

If you dont want to enable Serverless on this case you will need to continue using shared access mode, which the main implication is that multiple users will be able to use this cluster if they have permissions to.

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you wonโ€™t want to miss the chance to attend and share knowledge.

If there isnโ€™t a group near you, start one and help create a community that brings people together.

Request a New Group