08-20-2024 05:53 AM - edited 08-20-2024 05:55 AM
Hi All,
I am using Databricks Connect 14.3.2 with Databricks Runtime 14.3 LTS to execute the code below. The CSV file is only 7MB, the code runs without issues on Databricks Runtime 15+ clusters but consistently produces the error message shown below when using 14.3 LTS. Please advise.
SparkConnectGrpcException: (org.apache.spark.sql.connect.common.InvalidPlanInput) Not found any cached local relation with the hash: hash_guid in the session with sessionUUID session_guid.
import pandas as pd
from databricks.connect import DatabricksSession
cluster_id = '*****'
user_token = "*******"
host = 'https://***********.azuredatabricks.net/'
sp = DatabricksSession.builder.remote(host = host,cluster_id = cluster_id,token = user_token).getOrCreate()
df = sp.createDataFrame(pd.read_csv("C:\\temp\\data.csv"))
df.show(5)
09-05-2024 05:50 AM
Unfortunately I don't have this information. I only raised it for 14.3 LTS since my databricks connect version is the same (14.3.1).
09-05-2024 05:12 AM
The error is now occuring for Cluster with Version 15.4. for me. Did the fix get released yet?
09-05-2024 06:03 AM
I don't think it has been released yet, and now I’m facing this issue on both 14.3 LTS and 15.4 LTS :(.
FYI @Retired_mod
09-05-2024 08:00 AM
I have the same issue with 13.3 LTS version
09-06-2024 12:29 AM
I was informed that the fix was released. According to our tests, however, it did not fix anything. Instead, 15.4. LTS is now broken, too. This topic is getting urgent for us, now.
09-08-2024 11:25 PM
Any updates on this? Facing the same issue with 15.4 LTS now as well..
09-09-2024 01:28 AM - edited 09-09-2024 01:29 AM
Microsoft support just mentioned that fix has been deployed by Databricks, but the issue continues to persist for me on both 14.3 LTS and 15.4 LTS.
09-09-2024 01:55 AM
Now same issue with version 15.4 LTS. Does the fix for 14.3 LTS work? Thanks!
09-10-2024 02:25 AM
I still have the issue, but I noticed I do not have it on Linux. Support told me to use this line:
spark.conf.set("spark.sql.session.localRelationCacheThreshold", 64 * 1024 * 1024)
With that, it worked on windows, also this gives a hint what should be the batch size.
09-10-2024 09:06 PM
I have a troubleshooting session scheduled with Databricks today regarding this issue and will keep everyone updated on the progress.
09-11-2024 09:01 AM
As a workaround, please try the following Spark configuration, which seems to have resolved the issue for me on both 14.3 LTS and 15.4 LTS.
spark.conf.set("spark.sql.session.localRelationCacheThreshold", 64 * 1024 * 1024)
09-11-2024 09:04 PM
Databricks confirmed the same workaround while they work on a permanent fix.
Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.
If there isn’t a group near you, start one and help create a community that brings people together.
Request a New Group