Databricks Community

ahsan_aj · ‎08-20-2024

Hi All,

I am using Databricks Connect 14.3.2 with Databricks Runtime 14.3 LTS to execute the code below. The CSV file is only 7MB, the code runs without issues on Databricks Runtime 15+ clusters but consistently produces the error message shown below when using 14.3 LTS. Please advise.

SparkConnectGrpcException: (org.apache.spark.sql.connect.common.InvalidPlanInput) Not found any cached local relation with the hash: hash_guid in the session with sessionUUID session_guid.

import pandas as pd
from databricks.connect import DatabricksSession

cluster_id = '*****'
user_token = "*******"
host = 'https://***********.azuredatabricks.net/'

sp = DatabricksSession.builder.remote(host = host,cluster_id = cluster_id,token = user_token).getOrCreate()
df = sp.createDataFrame(pd.read_csv("C:\\temp\\data.csv"))
df.show(5)

ahsan_aj · ‎09-11-2024

As a workaround, please try the following Spark configuration, which seems to have resolved the issue for me on both 14.3 LTS and 15.4 LTS.

spark.conf.set("spark.sql.session.localRelationCacheThreshold", 64 * 1024 * 1024)

View solution in original post

MichalMazurek · ‎08-20-2024

Same here, it was working yesterday, stopped Today.

MichalMazurek · ‎08-20-2024

I managed to upload data in batches of 500 rows.

ahsan_aj · ‎08-20-2024

Same here, it stopped working since yesterday. I can confirm it works with smaller data sets. Has there been any communication from Databricks about this?

ahsan_aj · ‎08-21-2024

I've raised this issue with Databricks support, and they are currently investigating it.

Retired_mod · ‎08-21-2024

Hi @ahsan_aj and @MichalMazurek, We are looking into this. We will get back to you shortly.

KBoogaard · ‎08-22-2024

Any updates on this issue? I ran into the same problem. Scripts were running just fine and without any changes they stopped working with this error message.

ahsan_aj · ‎08-22-2024

@Retired_mod, is there any update on the issue ?

@KBoogaard, I raised this with Microsoft support, however they are unable to replicate the issue. I have a call with them on Monday.

ADuma · ‎08-22-2024

I'm having the same problem.
I create a Spark Dataframe from a pandas dataframe (10000 rows, between 500 and 800 columns) and want to upload them. Worked fine two weeks ago, now I'm getting the error. For some files it still works, others it works when reducing amount of rows and columns.

MichalMazurek · ‎08-22-2024

I got information from our Databricks manager, that this is a known issue that they are working on, although this takes a lot of time, for us it's a huge problem for going on production with this!

MichalMazurek · ‎08-23-2024

Hi @Retired_mod any news?

Sorhawell · ‎08-26-2024

To to convert pd df to a dict before passing on

E.g. createDataFrame(pd.read_csv('abc').to.dict())

In my case there was some non serializable attribute on pd df which is dropped when converting to dict.

Btw logging in here and answering was an insane obstacle. My large company subscription is set wrong so we can't login into community because our admin accounts does not have literal email accounts. Attempt to setup a personal account was only possible when not at all using the corp WiFi, I get kicked

MichalMazurek · ‎08-26-2024

Hi @Retired_mod, this a really bad support experience. Is this how Databricks support manages issues? I am currently thinking about using a different solution, this is an outage for several days now.

ahsan_aj · ‎08-26-2024

No updates from my side either. We're currently using the 15.4 LTS runtime, and it's working fine. The issue seems to be with the 14.3 LTS version only.