Collation problem with df.first() when different from UTF8_BINARY
Options
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
3 weeks ago - last edited 3 weeks ago
I'm getting a error when I want to select the first() from a dataframe when using a collation different than UTF8_BINARY
This works :
df_result = spark.sql(f"""
SELECT 'en-us' AS ETLLanguageCode
""")
display(df_result)
print(df_result.collect())
print(df_result.first())
print(df_result.first().asDict())
When I run this :
df_result = spark.sql(f"""
SELECT 'en-us' COLLATE UTF8_LCASE AS ETLLanguageCode
""")
display(df_result)
print(df_result.collect())
print(df_result.first())
print(df_result.first().asDict())
I'm getting an error because the first() is empty, the count from the df says 1
What can I do to resolve this ? My tables are all UTF8_LCASE for the strings.
Settings :
1-1 Worker
16-16 GB Memory4-4 Cores
1 Driver
16 GB Memory, 4 Cores
Runtime
16.3.x-scala2.12
Unity Catalog
Photon
Standard_D4ds_v5
1 REPLY 1
Options
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
3 weeks ago
other example :

