Options
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
13 hours ago
Hi,
1. What situations typically trigger AMBIGUOUS_REFERENCE_TO_FIELDS
It occurs when Spark finds multiple columns with the same name at the same nesting level in a Data Frame. It most commonly happens due to
- Wildcard expansion - Using .select("json.*") followed by .select("*", "k.*") creates both a struct field k (containing nested t) and a flat field t at the top level
- Union/Join collisions: Combining Data Frames that both have fields named t without proper aliasing
- Duplicate schema definitions: Defining the same field twice in a StructType
2. Can nested fields in a separate schema cause this error
Not by itself. Trade schema with t and kline schema with k.t are good independently.
Problem arises when you
- Expand struct fields with wildcards (select("k.*") promotes nested k.t to top-level t)
- Combine both streams without distinct column names
3. Can nested fields in a separate schema cause this error
- Wildcard expansion (most common)
- Column expansion with select("*") or select("struct_field.*")
- Union/join operations without explicit column selection
4. What debugging steps would you recommend to identify which DataFrame contains the duplicate field
You can check parsing code
- Print df.columns after each transformation to spot duplicates
- Print df.printSchema() to see if it appears at multiple levels
- Check for .select("json.*") or .select("*", "k.*") patterns
Use explicit nested field paths with proper aliasing for kline stream like you already do for trade stream.