Databricks Community

Jaris · ‎02-16-2024

Hello everyone,

We have switched from DBR 13.3 to 14.3 on our Shared development cluster and I am no longer able to run following read from a delta table with CDC enabled:

data = ( 
    spark.read.format("delta")
	.option("readChangeFeed", "true")
	.option("startingVersion", x)
	.option("endingVersion", x)
	.table(f"bronze.{table_name}")
	.select("GJAHR")
)

The same select works fine on single user cluster with DBR 14.3, on shared cluster with DBR 13.3, as well as when I use following SQL equivalent on shared cluster with DBR 14.3:

SELECT "GJAHR"
    FROM table_changes('bronze.{table_name}', x, x)

The issue seems that it cannot somehow match the selected field to what is available in the table. If I run the code without the .select("GJAHR"), it works fine. Also If I select only the CDC fields like _commit_version all runs well. Here is excerpt from the error message produced by the first code snippet:

AnalysisException: [MISSING_ATTRIBUTES.RESOLVED_ATTRIBUTE_APPEAR_IN_OPERATION] Resolved attribute(s) "GJAHR" missing from "RCLNT", "RLDNR", "RBUKRS", "GJAHR", ...

!Project [GJAHR#72222]. Attribute(s) with the same name appear in the operation: "GJAHR".
Please check if the right attribute(s) are used. SQLSTATE: XX000;
Aggregate [count(1) AS count(1)#72723L]
+- !Project [GJAHR#72222]
   +- ...
      +- Relation snpdwh.bronze_sap.acdoca[RCLNT#72724,RLDNR#72725,RBUKRS#72726,GJAHR#72727,...

DBR 14.3 is not be in beta anymore, so all should work fine. Type of compute (except of the mentioned access mode) plays no role. The Databricks is hosted on Azure.

Is this a bug or do you see any errors in my logic?

Thanks.

Jaris · ‎02-16-2024

Hello Kaniz,

Thank you for your comprehensive answer.

Unfortunately none of those points apply to my case. The selected column is present exactly once in the source table and there is no more code, this is all I am running to reproduce the issue:

data = ( spark.read.format("delta")
    .option("readChangeFeed", "true")
    .option("startingVersion", 161)
    .option("endingVersion", 161)
    .table("table_name")
    .select("GJAHR")
)
data.count()

I just switch between 2 computes, one Single User and one Shared, both running on the same DBR 14.3, and I get the error only with the Shared cluster.

Thank you.

Jaris · ‎02-16-2024

Hello Kaniz,

Thanks again for your effort.

I have tried everything, except the column alias in this form, but that didn't help either.

Cluster settings are also not an issue. Just to be sure, I have created a new cluster, left everything on default and only changed the DBR to 14.3. On Single user mode the code runs seamlessly. When I change only the access mode to Shared and restart, the issue appears.

If you have access to Databricks instance, the issue should be pretty easy to replicate.

I am pretty sure at this point, this is a bug.

Jaris · ‎02-19-2024

Hello Kaniz,

Is it possible to report this bug? For my case there are multiple ways I've mentioned above how can I work around, but it would be helpful to have that fixed in the future.

Thank you.

Databricks Community

CDC Delta table select using startingVersion on Shared cluster running DBR 14.3 does not work

Connect with Databricks Users in Your Area

Databricks Learning Festival (Virtual): 15 January - 31 January 2025

Milestone: DatabricksTV Reaches 100 Videos!

Announcing the new Meta Llama 3.3 model on Databricks

Databricks Community Champion - December 2024 - Sujesh Menon

Dotmatics and Databricks Partner to Advance Scientific Intelligence in Life Sciences