cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

DLT schema ambiguity

merca
Valued Contributor II

I have schema:

|    |-- costCentres: struct (nullable = true)
 |    |    |-- dimension1: struct (nullable = true)
 |    |    |    |-- name: string (nullable = true)
 |    |    |    |-- value: string (nullable = true)
 |    |    |-- dimension10: struct (nullable = true)
 |    |    |    |-- name: string (nullable = true)
 |    |    |    |-- value: string (nullable = true)

When I use dataframe to select and save:

df = df_positions.selectExpr(
    "positions.costCentres.dimension1.value as u_kb01",
    "positions.costCentres.dimension10.value as u_kb10",
    "positions.costCentres.dimension2.value as u_kb02",
    "positions.costCentres.dimension3.value as u_kb03",
    "positions.costCentres.dimension4.value as u_kb04",
    "positions.costCentres.dimension5.value as u_kb05",
    "positions.costCentres.dimension6.value as u_kb06",
    "positions.costCentres.dimension7.value as u_kb07",
    "positions.costCentres.dimension8.value as u_kb08",
    "positions.costCentres.dimension9.value as u_kb09",
).distinct()
 
df.write.saveAsTable("test_costcenters")
df.write.save("/temp/test_costcenters")

I get required result and I'm happy.

When I do the same in Delta Live tables:

def gold_costcenter(): 
    return df_positions.selectExpr(
                            "positions.costCentres.dimension1.value as u_kb01",
                            "positions.costCentres.dimension10.value as u_kb10",
                            "positions.costCentres.dimension2.value as u_kb02",
                            "positions.costCentres.dimension3.value as u_kb03",
                            "positions.costCentres.dimension4.value as u_kb04",
                            "positions.costCentres.dimension5.value as u_kb05",
                            "positions.costCentres.dimension6.value as u_kb06",
                            "positions.costCentres.dimension7.value as u_kb07",
                            "positions.costCentres.dimension8.value as u_kb08",
                            "positions.costCentres.dimension9.value as u_kb09",
                        ).distinct()

I get an error:

org.apache.spark.sql.AnalysisException: Ambiguous reference to fields StructField(value,StringType,true), StructField(value,StringType,true), StructField(value,StringType,true), StructField(value,StringType,true), StructField(value,StringType,true), StructField(value,StringType,true), StructField(value,StringType,true), StructField(value,StringType,true), StructField(value,StringType,true), StructField(value,StringType,true)
at org.apache.spark.sql.errors.QueryCompilationErrors$.ambiguousReferenceToFieldsError(QueryCompilationErrors.scala:1587)

Why??? And how to resolve this?

2 REPLIES 2

PeteC
New Contributor III

I've got the same problem - but using a SQL Select statement (with some explodes).

PeteC
New Contributor III

A colleague also having the same issue. He thinks he might be close to a solution. I'll update if he does find one.

Join 100K+ Data Experts: Register Now & Grow with Us!

Excited to expand your horizons with us? Click here to Register and begin your journey to success!

Already a member? Login and join your local regional user group! If there isn’t one near you, fill out this form and we’ll create one for you to join!