Reason: Remote RPC client disassociated. Likely due to containers exceeding thresholds, or network issues. Check driver logs for WARN messages.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
11-17-2022 01:47 PM
I have been getting this error sporadically. I'm loading a dataset and training a model using the dataset in notebook. Sometimes it works and sometimes it doesn't. I have seen similar posts and tried all solutions mentioned, log output size limit, spark.network.timeout configurations, creating a temporary view. Nothing fundamentally solved the issue. Sometimes it would work without any issues, and sometimes I would get the error above. But I'm pretty sure there is no memory issues and I have allocated enough cluster memory. Could you please shed some light on what is causing this issue? Especially I don't understand why it only breaks some time but not always. So really hard to pinpoint the issue. Thank you!
- Labels:
-
Remote RPC Client
-
Temporary View
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
11-17-2022 03:27 PM
@Leo Bao Are you seeing this issue whenever you are getting different sizes of data sets, or your data set size is same. if issue you are seeing is due to larger dataset, please check below link and try to increase partition size Databricks Spark Pyspark RDD Repartition - "Remote RPC client disassociated. Likely due to container...
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
11-17-2022 04:27 PM
Thank you for your reply! It is happening whenever I use different sizes of data sets. But it's not because the dataset is larger, even when it's smaller there's issues. Just curious is there a rule of thumb for the size of each partition so it might work? Also I did try to adjust the partition size and still sometimes it works and sometimes it doesn't.