Hi @MDV
I guess the issue likely comes from how non-default collations like UTF8_LCASE behave during serialization when using first() or collect(). As a workaround wrap the value in a subquery and re-cast the collation back to UTF8_BINARY before accessing it:
df_result = spark.sql("""
SELECT ETLLanguageCodes COLLATE UTF8_BINARY AS ETLLanguageCode
FROM (
SELECT 'en-us' COLLATE UTF8_LCASE AS ETLLanguageCodes
) temp
""")
print(df_result.collect())
If this works, it likely confirms the collation is affecting serialization.