topic Serverless Compute - Spark - Jobs failing with Max iterations (1000) reached for batch Resolution in Data Engineering

Serverless Compute - Spark - Jobs failing with Max iterations (1000) reached for batch Resolution

Ramana — Thu, 11 Sep 2025 19:50:07 GMT

Hello Community,

We have been trying to migrate our jobs from Classic Compute to Serverless Compute. As part of this process, we face several challenges, and this is one of them.

When we try to execute the existing jobs with Serverless Compute, if the job deals with a small amount of data or a small number of stages, Serverless Compute works great. But when we try to use the Serverless Compute for processing a large amount of data with a large number of intermediate transformations, the job fails with the following error:

Exception: (java.lang.RuntimeException) Max iterations (1000) reached for batch Resolution, please set 'spark.sql.analyzer.maxIterations' to a larger value

This error indicates that the query plan required more than the default 1000 iterations to resolve, likely due to deeply nested logic or complex transformations in our code. However, in serverless environments, the spark.sql.analyzer.maxIterations configuration is not accessible or overridable, as it is not exposed via Spark Connect.

Has anyone faced the similar issue?

Any suggestion or recommendation is greatly appreciated.

Screenshots:

#ServerlessCompute

#DataEngineering

#ClassicCompute-to-ServerlessCompute-Migration

#Migration

Re: Serverless Compute - Spark - Jobs failing with Max iterations (1000) reached for batch Resolutio

K_Anudeep — Sat, 13 Sep 2025 10:22:59 GMT

Hello @Ramana ,

The above error occurs when the Spark SQL optimiser is unable to resolve a query within the fixed maximum number of rule-application iterations (default 1000) in its internal logical plan "Resolution" phase. This typically happens with particularly complex queries, especially those that involve:

Excessively deep or "chained" query plans, often produced when repeatedly applying DataFrame transformations, like many chained .withColumn() calls
Highly nested views or subqueries, especially those involving multiple self-joins or recursive structures
Generated query plans with conflicting or redundant attributes

In Databricks serverless and some managed environments, most Spark SQL configs—including spark.sql.analyzer.maxIterations—cannot be changed/Thus, increasing the setting is not possible as a workaround on serverless.

So the only way is to reduce the complexity of the logical plan the analyser generates, and that can be done by optimising your query by breaking it down to smaller steps, materializing each before proceeding to the next transformation.

Please let me know if you have any further questions

Re: Serverless Compute - Spark - Jobs failing with Max iterations (1000) reached for batch Resolutio

Ramana — Mon, 15 Sep 2025 14:19:18 GMT

Thanks for sharing your thoughts.

The job(s) actually failing are simple SELECT * FROM source_table WHERE where_clause with some level of JDBC partitioning and then simple transformations like casting the data types, applying some regex, etc., (may be some 10-15 different transformations) at the dataframe level. These jobs are not at all complex, but still fail when we have a large amount of data. I don't think a logical plan varies based on data. I know logical plan varies based on the number/type of transformations, but not by the size of the data (mostly), there may be some exceptions.

Persisting data at every intermediate state is not a good idea, especially with Serverless, because of no support for caching/persisting/ some level of tempview creations.

I can do optimizations of complex queries, but I don't have any scope to optimize the simple SELECT and simple dataframe transformations.

The main goal of Serverless is to reduce this kind of burden on the processes by implementing these techniques dynamically, which is the reason Databricks does not allow us to set any custom configurations. If we need to do all of this, then Databricks should allow us to set these configurations.

Re: Serverless Compute - Spark - Jobs failing with Max iterations (1000) reached for batch Resolutio

K_Anudeep — Tue, 16 Sep 2025 04:50:40 GMT

Hello @Ramana ,

You’re right that data volume doesn’t change the logical plan, but your pattern (Example: SELECT * from a wide table + 10–15 column transforms) can still exceed the analyzer’s fixed iteration cap on Serverless, because each * expansion and chained withColumn/casts/regex adds more alias resolution work, resulting in a huge stack of projections causing the analyzer to break.

I would suggest that you understand what optimiser rules are being applied and why they are exceeding the default max values by setting spark.sql.planChangeLog.level to INFO and then simplify the code as required.

Re: Serverless Compute - Spark - Jobs failing with Max iterations (1000) reached for batch Resolutio

Ramana — Tue, 16 Sep 2025 15:01:15 GMT

Thank you.

But the same job with a limit clause works great. I don't think it is related to the logical plan, but I will look into your suggestion for tracing the issue down.

If serverless doesn't work for these basic transformations, it will be tough to utilize for complex jobs (like dynamic code generation jobs), which is what I am trying to convey here.

Migration from Classic to Serverless is not a straightforward approach, and it appears that most classic compute jobs should be rewritten to execute in Serverless.

Re: Serverless Compute - Spark - Jobs failing with Max iterations (1000) reached for batch Resolutio

Ramana — Wed, 17 Sep 2025 18:06:01 GMT

@K_Anudeep FYI: Serverless Compute doesn't support spark.sql.planChangeLog.level.

If we try to set up, the job will fail with [CONFIG_NOT_AVAILABLE] Configuration spark.sql.planChangeLog.level is not available. SQLSTATE: 42K0I error.

Classic Compute supports (as expected). I am trying to capture the statistics on Classic, but so far, I don't see any suspicious statistics on Classic. Since I set the node type, min, and max workers, Classic can accommodate the load. However, when it comes to Serverless, I have no idea how to view these stats because there is no Spark UI available in Serverless Jobs (I know that the Serverless has the Query history option but not sure how this replaces the Spark UI).

I feel like switching from Classic to Serverless is an architectural change versus a simple migration/Spark version upgrade.

I will share the Classic stats soon.

That being said, I don't think Serverless is fit for any of my company's Spark jobs, at least for now; this may change in the future.

Re: Serverless Compute - Spark - Jobs failing with Max iterations (1000) reached for batch Resolutio

Ramana — Wed, 12 Nov 2025 22:10:48 GMT

In Serverless Version 4, Databricks fixed this issue.