topic Serverless Compute no support for Caching data frames in Data Engineering

Serverless Compute no support for Caching data frames

Dave1967 — Mon, 30 Sep 2024 15:05:28 GMT

Can anyone please tell me why df.cache() and df.persist() are not supported in Serevrless compute?

Many Thanks

Re: Serverless Compute no support for Caching data frames

gchandra — Mon, 30 Sep 2024 15:32:02 GMT

Global caching functionality (and other global states used on classic clusters) is conceptually hard to represent on serverless computing.

Serverless spark cluster optimizes the cache than the user.

Re: Serverless Compute no support for Caching data frames

Dave1967 — Mon, 30 Sep 2024 15:59:21 GMT

Many Thanks

Re: Serverless Compute no support for Caching data frames

kunalmishra9 — Thu, 27 Feb 2025 21:49:21 GMT

What I do wish was possible was for serverless to warn that caching is not supported, but not error on a call. It makes switching between compute (serverless & all purpose) brittle and prevents code from easily being interoperable, no matter the compute type, which is significant friction against adopting serverless completely. Even having a parameter (i.e. .cache(try=True) ), would be nice to support this kind of workflow more elegantly.

Re: Serverless Compute no support for Caching data frames

mrroger — Tue, 13 May 2025 12:50:44 GMT

Hi. I'm not fully convinced that Serverless can optimize Spark cache better than the user, since I still see query plans with recomputed operations. What is the recommended best practice to avoid recomputation in a Serverless environment? Write out intermediate dataframes?