08-20-2024 05:53 AM - edited 08-20-2024 05:55 AM
Hi All,
I am using Databricks Connect 14.3.2 with Databricks Runtime 14.3 LTS to execute the code below. The CSV file is only 7MB, the code runs without issues on Databricks Runtime 15+ clusters but consistently produces the error message shown below when using 14.3 LTS. Please advise.
SparkConnectGrpcException: (org.apache.spark.sql.connect.common.InvalidPlanInput) Not found any cached local relation with the hash: hash_guid in the session with sessionUUID session_guid.
import pandas as pd
from databricks.connect import DatabricksSession
cluster_id = '*****'
user_token = "*******"
host = 'https://***********.azuredatabricks.net/'
sp = DatabricksSession.builder.remote(host = host,cluster_id = cluster_id,token = user_token).getOrCreate()
df = sp.createDataFrame(pd.read_csv("C:\\temp\\data.csv"))
df.show(5)
09-11-2024 09:01 AM
As a workaround, please try the following Spark configuration, which seems to have resolved the issue for me on both 14.3 LTS and 15.4 LTS.
spark.conf.set("spark.sql.session.localRelationCacheThreshold", 64 * 1024 * 1024)
08-20-2024 08:46 AM
Same here, it was working yesterday, stopped Today.
08-20-2024 01:29 PM
I managed to upload data in batches of 500 rows.
08-20-2024 10:17 PM
Same here, it stopped working since yesterday. I can confirm it works with smaller data sets. Has there been any communication from Databricks about this?
08-21-2024 02:23 AM
I've raised this issue with Databricks support, and they are currently investigating it.
08-22-2024 02:57 AM
Any updates on this issue? I ran into the same problem. Scripts were running just fine and without any changes they stopped working with this error message.
08-22-2024 03:46 AM
@Retired_mod, is there any update on the issue ?
@KBoogaard, I raised this with Microsoft support, however they are unable to replicate the issue. I have a call with them on Monday.
08-22-2024 06:20 AM
I'm having the same problem.
I create a Spark Dataframe from a pandas dataframe (10000 rows, between 500 and 800 columns) and want to upload them. Worked fine two weeks ago, now I'm getting the error. For some files it still works, others it works when reducing amount of rows and columns.
08-22-2024 06:59 AM
I got information from our Databricks manager, that this is a known issue that they are working on, although this takes a lot of time, for us it's a huge problem for going on production with this!
08-23-2024 12:46 AM
Hi @Retired_mod any news?
08-26-2024 05:26 AM
To to convert pd df to a dict before passing on
E.g. createDataFrame(pd.read_csv('abc').to.dict())
In my case there was some non serializable attribute on pd df which is dropped when converting to dict.
Btw logging in here and answering was an insane obstacle. My large company subscription is set wrong so we can't login into community because our admin accounts does not have literal email accounts. Attempt to setup a personal account was only possible when not at all using the corp WiFi, I get kicked
08-26-2024 10:11 AM
Hi @Retired_mod, this a really bad support experience. Is this how Databricks support manages issues? I am currently thinking about using a different solution, this is an outage for several days now.
08-26-2024 10:17 AM
No updates from my side either. We're currently using the 15.4 LTS runtime, and it's working fine. The issue seems to be with the 14.3 LTS version only.
08-28-2024 09:48 PM
Microsoft support confirmed that the fix has been merged and is set for release on September 3rd.
08-30-2024 07:59 AM
Is this fix confirmed for other runtimes? I am having the same issue on 13.3 LTS.
Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.
If there isn’t a group near you, start one and help create a community that brings people together.
Request a New Group