2 weeks ago
Today, while reading a delta load my notebook failed and I wanted to report a bug. The withColumns command does not tolerate an empty dictionary and gives the following error in PySpark.
flat_tuple = namedtuple("flat_tuple", ["old_col", "new_col", "logic"])
# flat_tuple(old_col, new_col, logic)
flat_tuples = [
flat_tuple("Coordinates", "Coordinates", extract_coordinates_udf(col("Coordinates")["coordinates"]))
, flat_tuple("CreatedById", "CreatedById", col("CreatedById")["$oid"])
, flat_tuple("CreationDate", "CreationDate", col("CreationDate")["$date"]["$numberLong"])
, flat_tuple("Names", "Names", col("Names")[0]["LanguageValue"])
, flat_tuple("Location", "LocationCoordinates", extract_coordinates_udf(col("Location")["coordinates"]))
, flat_tuple("Location", "LocationType", col("Location")["type"])
, flat_tuple("_id", "sectorId", col("_id")["$oid"])
]
final_flat_cols = {tup.new_col: tup.logic for tup in flat_tuples if tup.old_col in df.columns}
df = df.withColumns(final_flat_cols)
-- Output
AssertionError: [Trace ID: 00-68d8e7cacb471da60efe65d0ef17703d-a3b270f251715df4-00]
This case is handled in normal PySpark and I don't want to write a special if-else clause to check for the columns of dataframe before running withColumns. It would be great if it could be handled internally.
Currently, I'm using the following to handle this
flat_col_lst = [tup.logic.alias(tup.new_col) for tup in flat_tuples if tup.old_col in df.columns]
df = df.select('*', *flat_col_lst)
2 weeks ago
Hello @Dhruv-22 ,
I have tested this internally, and this seems to be a bug with the new Serverless env version 4
As a solution, you can try switching the version to 3 as shown bleow and re-run the above code, and it should work.
2 weeks ago
Hey @K_Anudeep
I tried using Environment Version 3, 2, and 1 but still got the same error. Attached is a screenshot with version 3.
2 weeks ago
Hey @Dhruv-22
Did you apply the version and create a new session/clear the existing session before running it? It should work on Env version 3 as mentioned in my repro below.
1 weeks ago
Yeah, I created a new session. I tried it 3-4 times.
a week ago
Sure! let me try once again and get back
a week ago
Hey @K_Anudeep, did you get anything?
Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!
Sign Up Now