Collation problem with df.first() when different f... - Databricks Community - 114301

Register to join the community

Administration & Architecture

Explore discussions on Databricks administration, deployment strategies, and architectural best practices. Connect with administrators and architects to optimize your Databricks environment for performance, scalability, and security.

I'm getting a error when I want to select the first() from a dataframe when using a collation different than UTF8_BINARY

This works :

df_result = spark.sql(f"""

SELECT 'en-us' AS ETLLanguageCode

""")

display(df_result)

print(df_result.collect())

print(df_result.first())

print(df_result.first().asDict())

When I run this :

df_result = spark.sql(f"""

SELECT 'en-us' COLLATE UTF8_LCASE AS ETLLanguageCode

""")

display(df_result)

print(df_result.collect())

print(df_result.first())

print(df_result.first().asDict())

I'm getting an error because the first() is empty, the count from the df says 1

What can I do to resolve this ? My tables are all UTF8_LCASE for the strings.

Settings :

1-1 Worker

16-16 GB Memory4-4 Cores

1 Driver

16 GB Memory, 4 Cores

Runtime

16.3.x-scala2.12

Unity Catalog

Photon

Standard_D4ds_v5

1 REPLY 1

other example :

48 KB

Photos

Upload Upload
URL URL
Saved Photos Saved Photos

Upload location

Upload location

Add Photos to Album:

New Album

Drag here to start uploading

Drag photos here or

Tap for upload options

You must install or upgrade to the latest version of Adobe Flash Player before you can upload images.

You must be signed in to add attachments

never-displayed

Announcements

Announcing the APJ Databricks Smart Business Insights Challenge: Empowering Data-Driven Decision Mak

🚀 Monthly Databricks Get Started Days – Accelerate Your Learning Journey! 🚀

Business Intelligence in the Era of AI

Virtual Learning Festival: 9 April - 30 April

Data + AI Summit 2025 — registration now open!

upload