cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

DBR14.3 Shared Access cluster delta.DeltaTable.toDF() issues

Olly
New Contributor II

Having issues with the pyspark DataFrames returned by delta.DeltaTable.toDF(), in what I believe is specific to shared access clusters on DBR14.3. Recently created a near identical workflow with the only major difference being that one of the source tables in a process is a federated table so we switched the job cluster to a shared access cluster so we could access it.

Part of the downstream process is a merge operation so we used the delta library directly and used DeltaTable.toDF() so get the dataframe, but a fair number of commands seem to be bust when using it afterwards, and cells get marked as having successfully complete even though its spitting out GRPC errors, and actions fail entirely, with quite obscure errors.

test data

(
    spark.createDataFrame([(1,)], "col: int")
    .write.format("delta")
    .mode("overwrite")
    .saveAsTable("test")
)


 
delta commands
 
import delta

df = delta.DeltaTable.forName(spark, "test").toDF()

df.select(df.col)
The cell with above appears to work but spits out this under the cell
2024-05-16 12:20:19,869 1872 ERROR _handle_rpc_error GRPC Error received Traceback (most recent call last): File "/databricks/spark/python/pyspark/sql/connect/client/core.py", line 1389, in _analyze resp = self._stub.AnalyzePlan(req, metadata=self._builder.metadata()) File "/databricks/python/lib/python3.10/site-packages/grpc/_channel.py", line 946, in __call__ return _end_unary_response_blocking(state, call, False, None) File "/databricks/python/lib/python3.10/site-packages/grpc/_channel.py", line 849, in _end_unary_response_blocking raise _InactiveRpcError(state) grpc._channel._InactiveRpcError: <_InactiveRpcError of RPC that terminated with: status = StatusCode.INTERNAL details = "[CANNOT_RESOLVE_DATAFRAME_COLUMN] Cannot resolve dataframe column "col". It's probably because of illegal references like `df1.select(df2.col("a"))`. SQLSTATE: 42704" debug_error_string = "UNKNOWN:Error received from peer unix:/databricks/sparkconnect/grpc.sock {grpc_message:"[CANNOT_RESOLVE_DATAFRAME_COLUMN] Cannot resolve dataframe column \"col\". It\'s probably because of illegal references like `df1.select(df2.col(\"a\"))`. SQLSTATE: 42704", grpc_status:13, created_time:"2024-05-16T12:20:19.869273292+00:00"}" >

doing an action on it makes it error, e.g. .display(), where it says it can't resolve the column.

Using spark.table() directly works as expected. DeltaTable.merge() seems to work fine with spark.table() so its quite straight forward to get around but still a nuisance as the behaviour is unusual. The original workflow on single access seems to work fine, so I'm guessing that this is an incompatibility with spark-connect in delta more than anything

1 ACCEPTED SOLUTION

Accepted Solutions

shan_chandra
Esteemed Contributor

Hi @Olly - can you please try the following?

import delta
from pyspark.sql.functions import col

df = delta.DeltaTable.forName(spark, "test111").toDF()
df.select(col("col"))

View solution in original post

2 REPLIES 2

shan_chandra
Esteemed Contributor

Hi @Olly - can you please try the following?

import delta
from pyspark.sql.functions import col

df = delta.DeltaTable.forName(spark, "test111").toDF()
df.select(col("col"))

Olly
New Contributor II

That works, as mentioned it is easy to work around. as does replacing

 

df = spark.table("test")
df.select(df.col)

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group