โ12-18-2024 09:39 AM - edited โ12-18-2024 09:42 AM
Why does merge.collect() not return results in access mode: SINGLE_USER, but it does in USER_ISOLATION? I need to log the affected rows (inserted and updated) and canโt find a simple way to get this data in SINGLE_USER mode. Is there a solution or an alternative method to retrieve this information?
#merge
#SINGLE_USER
#USER_ISOLATION
โ12-18-2024 11:56 AM
15.4 does not directly required the serverless but for fine-grained it indeed requires it to run it on Single User as mentioned
This data filtering is performed behind the scenes using serverless compute.
โ12-18-2024 09:58 AM
Can you share the complete error message you are receiving? Also share more details around the cluster configuration you are currently using when running single user cluster?
โ12-18-2024 10:44 AM
I have a piece of code that performs a merge operation followed by merge_result.collect(). This code is executed in two different scenarios:
Through Databricks Jobs with USER_ISOLATION access mode.
In this case, merge_result.collect() works correctly and returns the expected result.
Example output: Row(num_affected_rows=219921, num_updated_rows=0, num_deleted_rows=0, num_inserted_rows=219921)
Through Databricks Jobs with SINGLE_USER access mode.
In this case, merge_result.collect() returns None, causing the following error:
AttributeError: 'NoneType' object has no attribute 'collect'
The same code is deployed in both scenarios via GitHub Actions using Databricks Bundles.
Environment Details
Databricks Runtime: 14.3
Cluster Access Modes:
SINGLE_USER (causes the issue)
USER_ISOLATION (works correctly)
The merge operation code is as follows:
merge_result = (
target_df.alias("target")
.merge(source_df.alias("source"), merge_condition)
.whenMatchedUpdate(set=update_columns)
.whenNotMatchedInsert(values=insert_columns)
.execute()
)
row = merge_result.collect()[0]
How do the access modes (USER_ISOLATION vs. SINGLE_USER) impact the execution of merge operations?
What alternatives to merge_result.collect() exist for retrieving merge results that would work in SINGLE_USER mode?
Are there any recommended practices or patterns to ensure the merge operation and result retrieval work correctly in this mode?
โ12-18-2024 09:59 AM
In SINGLE_USER
mode, there are limitations, including restrictions on accessing certain tables and views, especially those with fine-grained access controls like row filters or column masks. This mode is designed to ensure that only the user who owns the cluster can access the data, which can lead to issues when trying to collect results from operations like merge
.
โ12-18-2024 10:36 AM
Given the limitations of SINGLE_USER mode, could you suggest any alternative approach or workaround to collect the affected rows (inserted and updated) in this mode? Is there a way to enable access or adjust settings to allow this in SINGLE_USER mode? Additionally, could you clarify the potential risks or drawbacks of using USER_ISOLATION mode for this purpose?
โ12-18-2024 11:00 AM - edited โ12-18-2024 11:00 AM
Can you test DBR 15.4 LTS, seems that this DBR version and above supports fine-grained access control on single user compute, which may resolve your issue
โ12-18-2024 11:05 AM
Doesn't DBR 15.4 LTS require serverless compute to be enabled on the workspace? If so, wouldn't this lead to an increase in costs for the project?
โ12-18-2024 11:56 AM
15.4 does not directly required the serverless but for fine-grained it indeed requires it to run it on Single User as mentioned
This data filtering is performed behind the scenes using serverless compute.
Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you wonโt want to miss the chance to attend and share knowledge.
If there isnโt a group near you, start one and help create a community that brings people together.
Request a New Group