balajij8
Contributor III

Hi,

1. What situations typically trigger AMBIGUOUS_REFERENCE_TO_FIELDS

It occurs when Spark finds multiple columns with the same name at the same nesting level in a Data Frame. It most commonly happens due to 

  • Wildcard expansion - Using .select("json.*") followed by .select("*", "k.*") creates both a struct field k (containing nested t) and a flat field t at the top level
  • Union/Join collisions: Combining Data Frames that both have fields named t without proper aliasing
  • Duplicate schema definitions: Defining the same field twice in a StructType

2. Can nested fields in a separate schema cause this error

Not by itself. Trade schema with t and kline schema with k.t are good independently.

Problem arises when you

  • Expand struct fields with wildcards (select("k.*") promotes nested k.t to top-level t)
  • Combine both streams without distinct column names

3. Can nested fields in a separate schema cause this error

  • Wildcard expansion (most common)
  • Column expansion with select("*") or select("struct_field.*")
  • Union/join operations without explicit column selection

4. What debugging steps would you recommend to identify which DataFrame contains the duplicate field

You can check parsing code

  • Print df.columns after each transformation to spot duplicates
  • Print df.printSchema() to see if it appears at multiple levels
  • Check for .select("json.*") or .select("*", "k.*") patterns

Use explicit nested field paths with proper aliasing for kline stream like you already do for trade stream.