- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
12-18-2024 10:44 AM
I have a piece of code that performs a merge operation followed by merge_result.collect(). This code is executed in two different scenarios:
Through Databricks Jobs with USER_ISOLATION access mode.
In this case, merge_result.collect() works correctly and returns the expected result.
Example output: Row(num_affected_rows=219921, num_updated_rows=0, num_deleted_rows=0, num_inserted_rows=219921)
Through Databricks Jobs with SINGLE_USER access mode.
In this case, merge_result.collect() returns None, causing the following error:
AttributeError: 'NoneType' object has no attribute 'collect'
The same code is deployed in both scenarios via GitHub Actions using Databricks Bundles.
Environment Details
Databricks Runtime: 14.3
Cluster Access Modes:
SINGLE_USER (causes the issue)
USER_ISOLATION (works correctly)
The merge operation code is as follows:
merge_result = (
target_df.alias("target")
.merge(source_df.alias("source"), merge_condition)
.whenMatchedUpdate(set=update_columns)
.whenNotMatchedInsert(values=insert_columns)
.execute()
)row = merge_result.collect()[0]How do the access modes (USER_ISOLATION vs. SINGLE_USER) impact the execution of merge operations?
What alternatives to merge_result.collect() exist for retrieving merge results that would work in SINGLE_USER mode?
Are there any recommended practices or patterns to ensure the merge operation and result retrieval work correctly in this mode?