Hello everyone,
We have switched from DBR 13.3 to 14.3 on our Shared development cluster and I am no longer able to run following read from a delta table with CDC enabled:
data = (
spark.read.format("delta")
.option("readChangeFeed", "true")
.option("startingVersion", x)
.option("endingVersion", x)
.table(f"bronze.{table_name}")
.select("GJAHR")
)
The same select works fine on single user cluster with DBR 14.3, on shared cluster with DBR 13.3, as well as when I use following SQL equivalent on shared cluster with DBR 14.3:
SELECT "GJAHR"
FROM table_changes('bronze.{table_name}', x, x)
The issue seems that it cannot somehow match the selected field to what is available in the table. If I run the code without the .select("GJAHR"), it works fine. Also If I select only the CDC fields like _commit_version all runs well. Here is excerpt from the error message produced by the first code snippet:
AnalysisException: [MISSING_ATTRIBUTES.RESOLVED_ATTRIBUTE_APPEAR_IN_OPERATION] Resolved attribute(s) "GJAHR" missing from "RCLNT", "RLDNR", "RBUKRS", "GJAHR", ...
!Project [GJAHR#72222]. Attribute(s) with the same name appear in the operation: "GJAHR".
Please check if the right attribute(s) are used. SQLSTATE: XX000;
Aggregate [count(1) AS count(1)#72723L]
+- !Project [GJAHR#72222]
+- ...
+- Relation snpdwh.bronze_sap.acdoca[RCLNT#72724,RLDNR#72725,RBUKRS#72726,GJAHR#72727,...
DBR 14.3 is not be in beta anymore, so all should work fine. Type of compute (except of the mentioned access mode) plays no role. The Databricks is hosted on Azure.
Is this a bug or do you see any errors in my logic?
Thanks.