โ09-30-2024 08:05 AM
Can anyone please tell me why df.cache() and df.persist() are not supported in Serevrless compute?
Many Thanks
โ09-30-2024 08:32 AM
Global caching functionality (and other global states used on classic clusters) is conceptually hard to represent on serverless computing.
Serverless spark cluster optimizes the cache than the user.
โ09-30-2024 08:32 AM
Global caching functionality (and other global states used on classic clusters) is conceptually hard to represent on serverless computing.
Serverless spark cluster optimizes the cache than the user.
โ09-30-2024 08:59 AM
Many Thanks
โ05-13-2025 05:50 AM
Hi. I'm not fully convinced that Serverless can optimize Spark cache better than the user, since I still see query plans with recomputed operations. What is the recommended best practice to avoid recomputation in a Serverless environment? Write out intermediate dataframes?
โ02-27-2025 01:49 PM
What I do wish was possible was for serverless to warn that caching is not supported, but not error on a call. It makes switching between compute (serverless & all purpose) brittle and prevents code from easily being interoperable, no matter the compute type, which is significant friction against adopting serverless completely. Even having a parameter (i.e. .cache(try=True) ), would be nice to support this kind of workflow more elegantly.
Passionate about hosting events and connecting people? Help us grow a vibrant local communityโsign up today to get started!
Sign Up Now