cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
cancel
Showing results for 
Search instead for 
Did you mean: 

Photon plan invariant violated Error

Vishwanath_Rao
New Contributor II

We've run into a niche error where we get the below message only on our non prod environment, with the same data, with the same code as our prod environment.

org.apache.spark.sql.execution.adaptive.InvalidAQEPlanException: Photon plan invariant violated: Broadcast joins have incompatible build plans regardingPhotonized vs. not-Photonized: ArrayBuffer(BroadcastHashJoin [COLUMN1#1372320], [COLUMN2#1422501], LeftOuter, BuildRight, false

Unsure as to what might be the issue, I've checked the spark logs, and it gives nothing on it as well, there doesn't seem to be any docs on these either. Any help is appreciated.

2 REPLIES 2

Kaniz
Community Manager
Community Manager

Hi @Vishwanath_RaoThe org.apache.spark.sql.execution.adaptive.InvalidAQEPlanException you’re encountering is related to Adaptive Query Execution (AQE) in Spark SQL. AQE is an optimization technique that dynamically adjusts query execution plans based on runtime sta...1.

Let’s break down the issue and explore potential solutions:

  1. Photon Plan Invariant Violation:

    • The error message indicates that there’s an issue with a broadcast join involving columns COLUMN1 and COLUMN2.
    • Specifically, it mentions Photonized vs. not-Photonized build plans, which suggests a discrepancy in how the join is being executed.
  2. Possible Causes:

    • Photonization refers to a technique where Spark optimizes data serialization and deserialization for certain operations.
    • The error suggests that the build plans for the broadcast join are inconsistent in terms of photonization.
    • This could be due to differences in data distribution, configuration settings, or other factors between your non-prod and prod environments.
  3. Troubleshooting Steps:

    • Here are some steps to investigate further:
      • Configuration Settings:
      • Data Distribution:
        • Examine the data distribution for the involved columns (COLUMN1 and COLUMN2). Are there significant differences in data skew or size?
        • Ensure that the data statistics (e.g., cardinality, min/max values) are accurate.
      • Join Strategy:
        • AQE dynamically selects join strategies based on runtime statistics. Consider explicitly specifying a join strategy (e.g., broadcast, shuffle hash) to see if it resolves the issue.
        • You can use hints like /*+ BROADCASTJOIN(column1, column2) */ to force a specific join type.
      • Logs and Metrics:
        • Although you mentioned that the Spark logs didn’t provide much information, recheck the logs for any additional clues.
        • Monitor metrics related to memory usage, shuffle data, and execution time during query execution.
      • Spark Version:
        • Ensure that both environments are running the same Spark version.
        • Sometimes issues are specific to certain Spark releases.

Remember that debugging such issues can be challenging, but systematically investigating the factors mentioned above should help narrow down the root cause.

Good luck, and I hope this helps you resolve the issue! 🚀🔍

1: Spark Performance Tuning Documentation

2: Databricks KB: Disable Vectorized Parquet Reader

 

Vishwanath_Rao
New Contributor II

Thank you @Kaniz! It looks like the issue was with a recent release on the databricks end, I'd raised a support ticket just to be sure, the difference also died down when we set these two up, at the cluster level.

spark.conf.set("spark.sql.adaptive.enabled", "true")

spark.conf.set("spark.sql.optimizer.excludedRules", "")

Welcome to Databricks Community: Lets learn, network and celebrate together

Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections. 

Click here to register and join today! 

Engage in exciting technical discussions, join a group with your peers and meet our Featured Members.