DBR 12.2: DeltaOptimizedWriter: Resolved attribute(s) missing from in operator

ivanychev
Contributor II

After upgrading from DBR 11.3 LTS to DBR 12.2 LTS we started to observe the following error during "read from parquet and write to delta" piece of logic.

AnalysisException: Resolved attribute(s) group_id#72,display_name#73,parent_id#74,path#75,path_list#76 missing from day#178,ac_key#179,group_id#180,display_name#181,parent_id#182,path#183,path_list#184 in operator !Project [empty2null(day#178) AS day#568, empty2null(ac_key#179) AS ac_key#569, group_id#72, display_name#73, parent_id#74, path#75, path_list#76]. Attribute(s) with the same name appear in the operation: group_id,display_name,parent_id,path,path_list. Please check if the right attribute(s) are used.;
WriteIntoDeltaCommand OutputSpec(s3://constructor-analytics-data/tables/delta_prod/item_groups,Map(),ArrayBuffer(day#178, ac_key#179, group_id#180, display_name#181, parent_id#182, path#183, path_list#184))
+- DeltaOptimizedWriter [day, ac_key], com.databricks.sql.transaction.tahoe.DeltaLog@3dc2a8b5, [spark.databricks.delta.optimize.minFileSize=268435456, spark.databricks.delta.autoCompact.maxFileSize=134217728, spark.databricks.delta.optimize.maxFileSize=268435456, spark.databricks.delta.autoCompact.minFileSize=67108864]
   +- DeltaInvariantChecker [Check(EXPRESSION(('day = 2023-03-07)),('day = 2023-03-07)), Check(EXPRESSION(('ac_key = key_ZMdl8uk3o2FQ3Bc9)),('ac_key = key_ZMdl8uk3o2FQ3Bc9))]
      +- !Project [empty2null(day#178) AS day#568, empty2null(ac_key#179) AS ac_key#569, group_id#72, display_name#73, parent_id#74, path#75, path_list#76]
         +- Project [day#164 AS day#178, ac_key#165 AS ac_key#179, group_id#166 AS group_id#180, display_name#167 AS display_name#181, parent_id#168 AS parent_id#182, path#169 AS path#183, path_list#170 AS path_list#184]
            +- Project [day#108 AS day#164, ac_key#116 AS ac_key#165, group_id#124 AS group_id#166, display_name#132 AS display_name#167, parent_id#140 AS parent_id#168, path#148 AS path#169, path_list#156 AS path_list#170]
               +- Project [day#108, ac_key#116, group_id#124, display_name#132, parent_id#140, path#148, path_list#76 AS path_list#156]
                  +- Project [day#108, ac_key#116, group_id#124, display_name#132, parent_id#140, path#75 AS path#148, path_list#76]
                     +- Project [day#108, ac_key#116, group_id#124, display_name#132, parent_id#74 AS parent_id#140, path#75, path_list#76]
                        +- Project [day#108, ac_key#116, group_id#124, display_name#73 AS display_name#132, parent_id#74, path#75, path_list#76]
                           +- Project [day#108, ac_key#116, group_id#72 AS group_id#124, display_name#73, parent_id#74, path#75, path_list#76]
                              +- Project [day#108, ac_key#93 AS ac_key#116, group_id#72, display_name#73, parent_id#74, path#75, path_list#76]
                                 +- Project [day#84 AS day#108, ac_key#93, group_id#72, display_name#73, parent_id#74, path#75, path_list#76]
                                    +- Project [day#84, ac_key#93, group_id#72, display_name#73, parent_id#74, path#75, path_list#76]
                                       +- Project [day#84, key_ZMdl8uk3o2FQ3Bc9 AS ac_key#93, group_id#72, display_name#73, parent_id#74, path#75, path_list#76]
                                          +- Project [2023-03-07 AS day#84, ac_key#71, group_id#72, display_name#73, parent_id#74, path#75, path_list#76]
                                             +- Relation [day#70,ac_key#71,group_id#72,display_name#73,parent_id#74,path#75,path_list#76] parquet

Weird thing here is that at !Project there's group_id#72 but the dependent Project has group_id#180 as if there's some bug in the plan. There's not joins in this pipeline, it's as simple as read + write to delta.

Do you have any idea of what can be wrong here? DeltaOptimizedWriter issue perhaps?

Sergey