cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

DLT schema ambiguity

merca
Valued Contributor II

I have schema:

|    |-- costCentres: struct (nullable = true)
 |    |    |-- dimension1: struct (nullable = true)
 |    |    |    |-- name: string (nullable = true)
 |    |    |    |-- value: string (nullable = true)
 |    |    |-- dimension10: struct (nullable = true)
 |    |    |    |-- name: string (nullable = true)
 |    |    |    |-- value: string (nullable = true)

When I use dataframe to select and save:

df = df_positions.selectExpr(
    "positions.costCentres.dimension1.value as u_kb01",
    "positions.costCentres.dimension10.value as u_kb10",
    "positions.costCentres.dimension2.value as u_kb02",
    "positions.costCentres.dimension3.value as u_kb03",
    "positions.costCentres.dimension4.value as u_kb04",
    "positions.costCentres.dimension5.value as u_kb05",
    "positions.costCentres.dimension6.value as u_kb06",
    "positions.costCentres.dimension7.value as u_kb07",
    "positions.costCentres.dimension8.value as u_kb08",
    "positions.costCentres.dimension9.value as u_kb09",
).distinct()
 
df.write.saveAsTable("test_costcenters")
df.write.save("/temp/test_costcenters")

I get required result and I'm happy.

When I do the same in Delta Live tables:

def gold_costcenter(): 
    return df_positions.selectExpr(
                            "positions.costCentres.dimension1.value as u_kb01",
                            "positions.costCentres.dimension10.value as u_kb10",
                            "positions.costCentres.dimension2.value as u_kb02",
                            "positions.costCentres.dimension3.value as u_kb03",
                            "positions.costCentres.dimension4.value as u_kb04",
                            "positions.costCentres.dimension5.value as u_kb05",
                            "positions.costCentres.dimension6.value as u_kb06",
                            "positions.costCentres.dimension7.value as u_kb07",
                            "positions.costCentres.dimension8.value as u_kb08",
                            "positions.costCentres.dimension9.value as u_kb09",
                        ).distinct()

I get an error:

org.apache.spark.sql.AnalysisException: Ambiguous reference to fields StructField(value,StringType,true), StructField(value,StringType,true), StructField(value,StringType,true), StructField(value,StringType,true), StructField(value,StringType,true), StructField(value,StringType,true), StructField(value,StringType,true), StructField(value,StringType,true), StructField(value,StringType,true), StructField(value,StringType,true)
at org.apache.spark.sql.errors.QueryCompilationErrors$.ambiguousReferenceToFieldsError(QueryCompilationErrors.scala:1587)

Why??? And how to resolve this?

2 REPLIES 2

PeteC
New Contributor III

I've got the same problem - but using a SQL Select statement (with some explodes).

PeteC
New Contributor III

A colleague also having the same issue. He thinks he might be close to a solution. I'll update if he does find one.

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group