Databricks Community

GS_S · 5 hours ago

Why does merge.collect() not return results in access mode: SINGLE_USER, but it does in USER_ISOLATION? I need to log the affected rows (inserted and updated) and can’t find a simple way to get this data in SINGLE_USER mode. Is there a solution or an alternative method to retrieve this information?

#merge

#SINGLE_USER

#USER_ISOLATION

Walter_C · 3 hours ago

15.4 does not directly required the serverless but for fine-grained it indeed requires it to run it on Single User as mentioned

This data filtering is performed behind the scenes using serverless compute.

In terms of costs:
Customers are charged for the serverless compute resources that are used to perform data filtering operations. For pricing information, see Platform Tiers and Add-Ons.

If you dont want to enable Serverless on this case you will need to continue using shared access mode, which the main implication is that multiple users will be able to use this cluster if they have permissions to.

View solution in original post

Walter_C · 5 hours ago

Can you share the complete error message you are receiving? Also share more details around the cluster configuration you are currently using when running single user cluster?

GS_S · 4 hours ago

I have a piece of code that performs a merge operation followed by merge_result.collect(). This code is executed in two different scenarios:
Through Databricks Jobs with USER_ISOLATION access mode.
In this case, merge_result.collect() works correctly and returns the expected result.
Example output: Row(num_affected_rows=219921, num_updated_rows=0, num_deleted_rows=0, num_inserted_rows=219921)

Through Databricks Jobs with SINGLE_USER access mode.
In this case, merge_result.collect() returns None, causing the following error:
AttributeError: 'NoneType' object has no attribute 'collect'

The same code is deployed in both scenarios via GitHub Actions using Databricks Bundles.

Environment Details
Databricks Runtime: 14.3

Cluster Access Modes:

SINGLE_USER (causes the issue)
USER_ISOLATION (works correctly)

The merge operation code is as follows:

merge_result = (
target_df.alias("target")
.merge(source_df.alias("source"), merge_condition)
.whenMatchedUpdate(set=update_columns)
.whenNotMatchedInsert(values=insert_columns)
.execute()
)

row = merge_result.collect()[0]

How do the access modes (USER_ISOLATION vs. SINGLE_USER) impact the execution of merge operations?
What alternatives to merge_result.collect() exist for retrieving merge results that would work in SINGLE_USER mode?

Are there any recommended practices or patterns to ensure the merge operation and result retrieval work correctly in this mode?

Walter_C · 5 hours ago

In SINGLE_USER mode, there are limitations, including restrictions on accessing certain tables and views, especially those with fine-grained access controls like row filters or column masks. This mode is designed to ensure that only the user who owns the cluster can access the data, which can lead to issues when trying to collect results from operations like merge.

GS_S · 4 hours ago

Given the limitations of SINGLE_USER mode, could you suggest any alternative approach or workaround to collect the affected rows (inserted and updated) in this mode? Is there a way to enable access or adjust settings to allow this in SINGLE_USER mode? Additionally, could you clarify the potential risks or drawbacks of using USER_ISOLATION mode for this purpose?

Walter_C · 4 hours ago

Can you test DBR 15.4 LTS, seems that this DBR version and above supports fine-grained access control on single user compute, which may resolve your issue

GS_S · 4 hours ago

Doesn't DBR 15.4 LTS require serverless compute to be enabled on the workspace? If so, wouldn't this lead to an increase in costs for the project?

Walter_C · 3 hours ago

15.4 does not directly required the serverless but for fine-grained it indeed requires it to run it on Single User as mentioned

This data filtering is performed behind the scenes using serverless compute.

In terms of costs:
Customers are charged for the serverless compute resources that are used to perform data filtering operations. For pricing information, see Platform Tiers and Add-Ons.

If you dont want to enable Serverless on this case you will need to continue using shared access mode, which the main implication is that multiple users will be able to use this cluster if they have permissions to.

Databricks Community

Error during merge operation: 'NoneType' object has no attribute 'collect'

Connect with Databricks Users in Your Area

Databricks Learning Festival (Virtual): 15 January - 31 January 2025

Milestone: DatabricksTV Reaches 100 Videos!

Announcing the new Meta Llama 3.3 model on Databricks

Databricks Community Champion - December 2024 - Sujesh Menon

Dotmatics and Databricks Partner to Advance Scientific Intelligence in Life Sciences