02-16-2024 12:52 AM
Hello everyone,
We have switched from DBR 13.3 to 14.3 on our Shared development cluster and I am no longer able to run following read from a delta table with CDC enabled:
data = (
spark.read.format("delta")
.option("readChangeFeed", "true")
.option("startingVersion", x)
.option("endingVersion", x)
.table(f"bronze.{table_name}")
.select("GJAHR")
)
The same select works fine on single user cluster with DBR 14.3, on shared cluster with DBR 13.3, as well as when I use following SQL equivalent on shared cluster with DBR 14.3:
SELECT "GJAHR"
FROM table_changes('bronze.{table_name}', x, x)
The issue seems that it cannot somehow match the selected field to what is available in the table. If I run the code without the .select("GJAHR"), it works fine. Also If I select only the CDC fields like _commit_version all runs well. Here is excerpt from the error message produced by the first code snippet:
AnalysisException: [MISSING_ATTRIBUTES.RESOLVED_ATTRIBUTE_APPEAR_IN_OPERATION] Resolved attribute(s) "GJAHR" missing from "RCLNT", "RLDNR", "RBUKRS", "GJAHR", ...
!Project [GJAHR#72222]. Attribute(s) with the same name appear in the operation: "GJAHR".
Please check if the right attribute(s) are used. SQLSTATE: XX000;
Aggregate [count(1) AS count(1)#72723L]
+- !Project [GJAHR#72222]
+- ...
+- Relation snpdwh.bronze_sap.acdoca[RCLNT#72724,RLDNR#72725,RBUKRS#72726,GJAHR#72727,...
DBR 14.3 is not be in beta anymore, so all should work fine. Type of compute (except of the mentioned access mode) plays no role. The Databricks is hosted on Azure.
Is this a bug or do you see any errors in my logic?
Thanks.
02-16-2024 01:00 AM
Hi @Jaris, It appears that you’ve encountered an issue when reading from a Delta table with CDC (Change Data Capture) enabled after switching from Databricks Runtime (DBR) 13.3 to 14.3.
Let’s break down the situation and explore potential solutions:
Attribute Mismatch Error: The error message you provided indicates that there’s an issue with attribute resolution. Specifically, it states that the resolved attribute “GJAHR” is missing from other attributes like “RCLNT,” “RLDNR,” and “RBUKRS.” The error suggests that the same attribute name appears in multiple places, causing ambiguity.
Code Comparison: You mentioned that the same select works fine on a single-user cluster with DBR 14.3 and also when using SQL equivalent on a shared cluster with DBR 14.3. However, the issue arises when you explicitly select the “GJAHR” column in your Python code snippet.
Potential Causes and Solutions:
DESCRIBE DETAIL <table_name>
to inspect the table’s location and other details 1.Bug or Logic Error?: While it’s challenging to definitively say whether this is a bug or a logic error without examining the complete context, I recommend thoroughly reviewing your code, schema, and any recent changes.
DBR 14.3 Update: As you mentioned, DBR 14.3 is out of beta, and theoretically, it should work seamlessly. However, it’s essential to rule out any specific issues related to your environment, configuration, or table setup.
1: Databricks Community Forum: AnalysisException: is not a Delta table
02-16-2024 01:42 AM
Hello Kaniz,
Thank you for your comprehensive answer.
Unfortunately none of those points apply to my case. The selected column is present exactly once in the source table and there is no more code, this is all I am running to reproduce the issue:
data = ( spark.read.format("delta")
.option("readChangeFeed", "true")
.option("startingVersion", 161)
.option("endingVersion", 161)
.table("table_name")
.select("GJAHR")
)
data.count()
I just switch between 2 computes, one Single User and one Shared, both running on the same DBR 14.3, and I get the error only with the Shared cluster.
Thank you.
02-16-2024 01:58 AM
Hi @Jaris, Thank you for providing additional details. I apologize for the inconvenience you’re experiencing.
Let’s explore further to identify the root cause of the issue.
Given that the attribute “GJAHR” is present exactly once in the source table and there is no additional code, it’s puzzling that you encounter the error only on the Shared cluster.
Here are a few more steps to investigate:
Cluster Configuration:
Attribute Resolution Order:
Column Aliasing:
Schema Inspection:
Cluster-Specific Behavior:
02-16-2024 02:27 AM
Hello Kaniz,
Thanks again for your effort.
I have tried everything, except the column alias in this form, but that didn't help either.
Cluster settings are also not an issue. Just to be sure, I have created a new cluster, left everything on default and only changed the DBR to 14.3. On Single user mode the code runs seamlessly. When I change only the access mode to Shared and restart, the issue appears.
If you have access to Databricks instance, the issue should be pretty easy to replicate.
I am pretty sure at this point, this is a bug.
02-18-2024 09:27 PM
Hi @Jaris, Given the specific scenario you’ve described, it does indeed sound like an unexpected behaviour or bug within Databricks Runtime 14.3 on shared clusters.
I appreciate your diligence in troubleshooting, and I hope you find a resolution soon. If there’s anything else I can assist with, feel free to ask! 🌟
02-19-2024 12:09 AM
Hello Kaniz,
Is it possible to report this bug? For my case there are multiple ways I've mentioned above how can I work around, but it would be helpful to have that fixed in the future.
Thank you.
Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.
If there isn’t a group near you, start one and help create a community that brings people together.
Request a New Group