Re: PySpark AnalysisException: Ambiguous reference... - Databricks Community

Hi,

1. What situations typically trigger AMBIGUOUS_REFERENCE_TO_FIELDS

It occurs when Spark finds multiple columns with the same name at the same nesting level in a Data Frame. It most commonly happens due to

Wildcard expansion - Using .select("json.*") followed by .select("*", "k.*") creates both a struct field k (containing nested t) and a flat field t at the top level
Union/Join collisions: Combining Data Frames that both have fields named t without proper aliasing
Duplicate schema definitions: Defining the same field twice in a StructType

2. Can nested fields in a separate schema cause this error

Not by itself. Trade schema with t and kline schema with k.t are good independently.

Problem arises when you

Expand struct fields with wildcards (select("k.*") promotes nested k.t to top-level t)
Combine both streams without distinct column names

3. Can nested fields in a separate schema cause this error

Wildcard expansion (most common)
Column expansion with select("*") or select("struct_field.*")
Union/join operations without explicit column selection

4. What debugging steps would you recommend to identify which DataFrame contains the duplicate field

You can check parsing code

Print df.columns after each transformation to spot duplicates
Print df.printSchema() to see if it appears at multiple levels
Check for .select("json.*") or .select("*", "k.*") patterns

Use explicit nested field paths with proper aliasing for kline stream like you already do for trade stream.