- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
09-30-2024 08:05 AM
Can anyone please tell me why df.cache() and df.persist() are not supported in Serevrless compute?
Many Thanks
Accepted Solutions
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
09-30-2024 08:32 AM
Global caching functionality (and other global states used on classic clusters) is conceptually hard to represent on serverless computing.
Serverless spark cluster optimizes the cache than the user.
~
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
09-30-2024 08:32 AM
Global caching functionality (and other global states used on classic clusters) is conceptually hard to represent on serverless computing.
Serverless spark cluster optimizes the cache than the user.
~
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
09-30-2024 08:59 AM
Many Thanks
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
02-27-2025 01:49 PM
What I do wish was possible was for serverless to warn that caching is not supported, but not error on a call. It makes switching between compute (serverless & all purpose) brittle and prevents code from easily being interoperable, no matter the compute type, which is significant friction against adopting serverless completely. Even having a parameter (i.e. .cache(try=True) ), would be nice to support this kind of workflow more elegantly.

