DBR 14.1 Pyspark Join on df1["col1"] = df2["col1"] syntax fails
Options
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
11-15-2023 01:12 AM
Hello
After upgrading my cluster from DBR 12 to 14.1 I got a MISSING_ATTRIBUTES.RESOLVED_ATTRIBUTE_APPEAR_IN_OPERATION on some of my Joins
df1.join(
df2,
[df1["name"] == df2["name"], df1["age"] == df2["age"]],
'left_outer'
)
I resolved it by changing the syntax to:
df1.alias("l").join(
df2.alias("r"),
[col("l.name") == col("r.name"), col("l.age"]) == col("r.age")],
'left_outer'
)
Is the second syntax the new standard or am I missing something with the first one ?
Thanks
0 REPLIES 0

