<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Collation problem with df.first() when different from UTF8_BINARY in Administration &amp; Architecture</title>
    <link>https://community.databricks.com/t5/administration-architecture/collation-problem-with-df-first-when-different-from-utf8-binary/m-p/114301#M3208</link>
    <description>&lt;P&gt;I'm getting a error when I want to select the first() from a dataframe when using a collation different than UTF8_BINARY&lt;/P&gt;&lt;P&gt;This works :&lt;/P&gt;&lt;DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;df_result &lt;/SPAN&gt;&lt;SPAN&gt;=&lt;/SPAN&gt;&lt;SPAN&gt; spark&lt;/SPAN&gt;&lt;SPAN&gt;.sql&lt;/SPAN&gt;&lt;SPAN&gt;(&lt;/SPAN&gt;&lt;SPAN&gt;f&lt;/SPAN&gt;&lt;SPAN&gt;"""&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &lt;/SPAN&gt;&lt;SPAN&gt;SELECT&lt;/SPAN&gt; &lt;SPAN&gt;'en-us'&lt;/SPAN&gt; &lt;SPAN&gt;AS&lt;/SPAN&gt;&lt;SPAN&gt; ETLLanguageCode&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;"""&lt;/SPAN&gt;&lt;SPAN&gt;)&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;display&lt;/SPAN&gt;&lt;SPAN&gt;(df_result)&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;print&lt;/SPAN&gt;&lt;SPAN&gt;(df_result.&lt;/SPAN&gt;&lt;SPAN&gt;collect&lt;/SPAN&gt;&lt;SPAN&gt;())&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;print&lt;/SPAN&gt;&lt;SPAN&gt;(df_result.&lt;/SPAN&gt;&lt;SPAN&gt;first&lt;/SPAN&gt;&lt;SPAN&gt;())&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;print&lt;/SPAN&gt;&lt;SPAN&gt;(df_result.&lt;/SPAN&gt;&lt;SPAN&gt;first&lt;/SPAN&gt;&lt;SPAN&gt;().&lt;/SPAN&gt;&lt;SPAN&gt;asDict&lt;/SPAN&gt;&lt;SPAN&gt;())&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&amp;nbsp;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;When I run this :&amp;nbsp;&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&amp;nbsp;&lt;/DIV&gt;&lt;DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;df_result &lt;/SPAN&gt;&lt;SPAN&gt;=&lt;/SPAN&gt;&lt;SPAN&gt; spark&lt;/SPAN&gt;&lt;SPAN&gt;.sql&lt;/SPAN&gt;&lt;SPAN&gt;(&lt;/SPAN&gt;&lt;SPAN&gt;f&lt;/SPAN&gt;&lt;SPAN&gt;"""&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &lt;/SPAN&gt;&lt;SPAN&gt;SELECT&lt;/SPAN&gt; &lt;SPAN&gt;'en-us'&lt;/SPAN&gt; &lt;SPAN&gt;COLLATE&lt;/SPAN&gt;&lt;SPAN&gt; UTF8_LCASE &lt;/SPAN&gt;&lt;SPAN&gt;AS&lt;/SPAN&gt;&lt;SPAN&gt; ETLLanguageCode&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;"""&lt;/SPAN&gt;&lt;SPAN&gt;)&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;display&lt;/SPAN&gt;&lt;SPAN&gt;(df_result)&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;print&lt;/SPAN&gt;&lt;SPAN&gt;(df_result.&lt;/SPAN&gt;&lt;SPAN&gt;collect&lt;/SPAN&gt;&lt;SPAN&gt;())&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;print&lt;/SPAN&gt;&lt;SPAN&gt;(df_result.&lt;/SPAN&gt;&lt;SPAN&gt;first&lt;/SPAN&gt;&lt;SPAN&gt;())&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;print&lt;/SPAN&gt;&lt;SPAN&gt;(df_result.&lt;/SPAN&gt;&lt;SPAN&gt;first&lt;/SPAN&gt;&lt;SPAN&gt;().&lt;/SPAN&gt;&lt;SPAN&gt;asDict&lt;/SPAN&gt;&lt;SPAN&gt;())&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&amp;nbsp;&lt;/DIV&gt;&lt;DIV&gt;I'm getting an error because the first() is empty, the count from the df says 1&amp;nbsp;&lt;/DIV&gt;&lt;DIV&gt;&amp;nbsp;&lt;/DIV&gt;&lt;DIV&gt;What can I do to resolve this ? My tables are all UTF8_LCASE for the strings.&lt;/DIV&gt;&lt;DIV&gt;&amp;nbsp;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;Settings :&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;DIV&gt;&lt;DIV class=""&gt;&lt;SPAN class=""&gt;1-1 Worker&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV class=""&gt;&lt;SPAN class=""&gt;&lt;SPAN&gt;16-16&amp;nbsp;GB Memory&lt;/SPAN&gt;&lt;SPAN&gt;4-4&amp;nbsp;Cores&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV class=""&gt;&lt;SPAN class=""&gt;1 Driver&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV class=""&gt;&lt;SPAN class=""&gt;16&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;GB Memory,&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;4&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;Cores&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV class=""&gt;&lt;SPAN class=""&gt;Runtime&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV class=""&gt;&lt;SPAN class=""&gt;16.3.x-scala2.12&lt;/SPAN&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;DIV class=""&gt;&lt;DIV class=""&gt;&lt;DIV class=""&gt;&lt;DIV class=""&gt;&lt;DIV class=""&gt;Unity Catalog&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;DIV class=""&gt;&lt;DIV class=""&gt;&lt;DIV class=""&gt;Photon&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;DIV class=""&gt;&lt;DIV class=""&gt;&lt;DIV class=""&gt;Standard_D4ds_v5&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;DIV class=""&gt;&lt;DIV class=""&gt;&lt;DIV class=""&gt;&amp;nbsp;&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;DIV&gt;&amp;nbsp;&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;</description>
    <pubDate>Wed, 02 Apr 2025 15:00:54 GMT</pubDate>
    <dc:creator>MDV</dc:creator>
    <dc:date>2025-04-02T15:00:54Z</dc:date>
    <item>
      <title>Collation problem with df.first() when different from UTF8_BINARY</title>
      <link>https://community.databricks.com/t5/administration-architecture/collation-problem-with-df-first-when-different-from-utf8-binary/m-p/114301#M3208</link>
      <description>&lt;P&gt;I'm getting a error when I want to select the first() from a dataframe when using a collation different than UTF8_BINARY&lt;/P&gt;&lt;P&gt;This works :&lt;/P&gt;&lt;DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;df_result &lt;/SPAN&gt;&lt;SPAN&gt;=&lt;/SPAN&gt;&lt;SPAN&gt; spark&lt;/SPAN&gt;&lt;SPAN&gt;.sql&lt;/SPAN&gt;&lt;SPAN&gt;(&lt;/SPAN&gt;&lt;SPAN&gt;f&lt;/SPAN&gt;&lt;SPAN&gt;"""&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &lt;/SPAN&gt;&lt;SPAN&gt;SELECT&lt;/SPAN&gt; &lt;SPAN&gt;'en-us'&lt;/SPAN&gt; &lt;SPAN&gt;AS&lt;/SPAN&gt;&lt;SPAN&gt; ETLLanguageCode&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;"""&lt;/SPAN&gt;&lt;SPAN&gt;)&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;display&lt;/SPAN&gt;&lt;SPAN&gt;(df_result)&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;print&lt;/SPAN&gt;&lt;SPAN&gt;(df_result.&lt;/SPAN&gt;&lt;SPAN&gt;collect&lt;/SPAN&gt;&lt;SPAN&gt;())&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;print&lt;/SPAN&gt;&lt;SPAN&gt;(df_result.&lt;/SPAN&gt;&lt;SPAN&gt;first&lt;/SPAN&gt;&lt;SPAN&gt;())&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;print&lt;/SPAN&gt;&lt;SPAN&gt;(df_result.&lt;/SPAN&gt;&lt;SPAN&gt;first&lt;/SPAN&gt;&lt;SPAN&gt;().&lt;/SPAN&gt;&lt;SPAN&gt;asDict&lt;/SPAN&gt;&lt;SPAN&gt;())&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&amp;nbsp;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;When I run this :&amp;nbsp;&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&amp;nbsp;&lt;/DIV&gt;&lt;DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;df_result &lt;/SPAN&gt;&lt;SPAN&gt;=&lt;/SPAN&gt;&lt;SPAN&gt; spark&lt;/SPAN&gt;&lt;SPAN&gt;.sql&lt;/SPAN&gt;&lt;SPAN&gt;(&lt;/SPAN&gt;&lt;SPAN&gt;f&lt;/SPAN&gt;&lt;SPAN&gt;"""&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &lt;/SPAN&gt;&lt;SPAN&gt;SELECT&lt;/SPAN&gt; &lt;SPAN&gt;'en-us'&lt;/SPAN&gt; &lt;SPAN&gt;COLLATE&lt;/SPAN&gt;&lt;SPAN&gt; UTF8_LCASE &lt;/SPAN&gt;&lt;SPAN&gt;AS&lt;/SPAN&gt;&lt;SPAN&gt; ETLLanguageCode&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;"""&lt;/SPAN&gt;&lt;SPAN&gt;)&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;display&lt;/SPAN&gt;&lt;SPAN&gt;(df_result)&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;print&lt;/SPAN&gt;&lt;SPAN&gt;(df_result.&lt;/SPAN&gt;&lt;SPAN&gt;collect&lt;/SPAN&gt;&lt;SPAN&gt;())&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;print&lt;/SPAN&gt;&lt;SPAN&gt;(df_result.&lt;/SPAN&gt;&lt;SPAN&gt;first&lt;/SPAN&gt;&lt;SPAN&gt;())&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;print&lt;/SPAN&gt;&lt;SPAN&gt;(df_result.&lt;/SPAN&gt;&lt;SPAN&gt;first&lt;/SPAN&gt;&lt;SPAN&gt;().&lt;/SPAN&gt;&lt;SPAN&gt;asDict&lt;/SPAN&gt;&lt;SPAN&gt;())&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&amp;nbsp;&lt;/DIV&gt;&lt;DIV&gt;I'm getting an error because the first() is empty, the count from the df says 1&amp;nbsp;&lt;/DIV&gt;&lt;DIV&gt;&amp;nbsp;&lt;/DIV&gt;&lt;DIV&gt;What can I do to resolve this ? My tables are all UTF8_LCASE for the strings.&lt;/DIV&gt;&lt;DIV&gt;&amp;nbsp;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;Settings :&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;DIV&gt;&lt;DIV class=""&gt;&lt;SPAN class=""&gt;1-1 Worker&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV class=""&gt;&lt;SPAN class=""&gt;&lt;SPAN&gt;16-16&amp;nbsp;GB Memory&lt;/SPAN&gt;&lt;SPAN&gt;4-4&amp;nbsp;Cores&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV class=""&gt;&lt;SPAN class=""&gt;1 Driver&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV class=""&gt;&lt;SPAN class=""&gt;16&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;GB Memory,&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;4&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;Cores&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV class=""&gt;&lt;SPAN class=""&gt;Runtime&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV class=""&gt;&lt;SPAN class=""&gt;16.3.x-scala2.12&lt;/SPAN&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;DIV class=""&gt;&lt;DIV class=""&gt;&lt;DIV class=""&gt;&lt;DIV class=""&gt;&lt;DIV class=""&gt;Unity Catalog&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;DIV class=""&gt;&lt;DIV class=""&gt;&lt;DIV class=""&gt;Photon&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;DIV class=""&gt;&lt;DIV class=""&gt;&lt;DIV class=""&gt;Standard_D4ds_v5&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;DIV class=""&gt;&lt;DIV class=""&gt;&lt;DIV class=""&gt;&amp;nbsp;&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;DIV&gt;&amp;nbsp;&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;</description>
      <pubDate>Wed, 02 Apr 2025 15:00:54 GMT</pubDate>
      <guid>https://community.databricks.com/t5/administration-architecture/collation-problem-with-df-first-when-different-from-utf8-binary/m-p/114301#M3208</guid>
      <dc:creator>MDV</dc:creator>
      <dc:date>2025-04-02T15:00:54Z</dc:date>
    </item>
    <item>
      <title>Re: Collation problem with df.first() when different from UTF8_BINARY</title>
      <link>https://community.databricks.com/t5/administration-architecture/collation-problem-with-df-first-when-different-from-utf8-binary/m-p/114372#M3211</link>
      <description>&lt;P&gt;other example :&lt;/P&gt;</description>
      <pubDate>Thu, 03 Apr 2025 07:51:14 GMT</pubDate>
      <guid>https://community.databricks.com/t5/administration-architecture/collation-problem-with-df-first-when-different-from-utf8-binary/m-p/114372#M3211</guid>
      <dc:creator>MDV</dc:creator>
      <dc:date>2025-04-03T07:51:14Z</dc:date>
    </item>
  </channel>
</rss>

