<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: How to make delta table column values case-insensitive? in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/how-to-make-delta-table-column-values-case-insensitive/m-p/12781#M7543</link>
    <description>&lt;P&gt;To make the comparisons case insensitive we can compare after converting columns to upper or lower case. &lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Eg: &lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;import org.apache.spark.sql.functions._&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;df.filter(lower($"colname") == lower($"colname"))&lt;/P&gt;</description>
    <pubDate>Wed, 10 Nov 2021 19:26:38 GMT</pubDate>
    <dc:creator>Sandeep</dc:creator>
    <dc:date>2021-11-10T19:26:38Z</dc:date>
    <item>
      <title>How to make delta table column values case-insensitive?</title>
      <link>https://community.databricks.com/t5/data-engineering/how-to-make-delta-table-column-values-case-insensitive/m-p/12775#M7537</link>
      <description>&lt;P&gt;&amp;nbsp;we have many delta tables with&amp;nbsp;string columns as unique key (PK in traditional relational db) and we don't want to insert new row because key value only differs in case.&amp;nbsp;&lt;/P&gt;&lt;P&gt;Its lot of code change to use upper/lower function on column value compare (in upsert logic) so looking for alternative&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;i see a CHECK constraint on delta table column can enforce consistent case value but its too late , i already have mixed case data in tables.&amp;nbsp;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;is there anything similar to sql server collation feature?&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;spark.conf.set('spark.sql.caseSensitive', False) does not work as expected (meaning string comparison between mixed case value shows i have 2 different strings)&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Also looked up spark.conf.set('spark.databricks.analyzer.batchResolveRelations', False) in vein&lt;/P&gt;&lt;P&gt;I have tried 7.3LTS and 9.1LTS databricks on azure&lt;/P&gt;</description>
      <pubDate>Wed, 20 Oct 2021 19:56:28 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/how-to-make-delta-table-column-values-case-insensitive/m-p/12775#M7537</guid>
      <dc:creator>prasadvaze</dc:creator>
      <dc:date>2021-10-20T19:56:28Z</dc:date>
    </item>
    <item>
      <title>Re: How to make delta table column values case-insensitive?</title>
      <link>https://community.databricks.com/t5/data-engineering/how-to-make-delta-table-column-values-case-insensitive/m-p/12777#M7539</link>
      <description>&lt;P&gt;I don't recall there being a collation in Spark/Delta Lake.&lt;/P&gt;&lt;P&gt;Also data corruption/loss is definitely a main focus of Databricks so I don´t think there is an easy way for fixing this.&lt;/P&gt;&lt;P&gt;What I would do is the following:&lt;/P&gt;&lt;P&gt;overwrite the tables which have mixed cases to uppercase (or lowercase, your choice).&lt;/P&gt;&lt;P&gt;That fixes your current data.&lt;/P&gt;&lt;P&gt;For the data which you want to upsert, you can create a wrapper function around spark.read.parquet (or csv or json or whatever you are ingesting) which translates string columns to uppercase.&lt;/P&gt;&lt;P&gt;We have to do this for a similar issue (trim all string columns).&lt;/P&gt;&lt;P&gt;Or you can just always call the upper/lower function.&lt;/P&gt;&lt;P&gt;Perhaps you can even translate everything to upper/lower while copying it to your storage.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;But both cases require work.&lt;/P&gt;&lt;P&gt;I don´t see a quick solution.&lt;/P&gt;&lt;P&gt;You can ofc leave the data as it is, and downstream always take into account that the data is mixed case.  So when reading in this mixed data, always apply an upper in filters etc.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;</description>
      <pubDate>Thu, 21 Oct 2021 11:27:59 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/how-to-make-delta-table-column-values-case-insensitive/m-p/12777#M7539</guid>
      <dc:creator>-werners-</dc:creator>
      <dc:date>2021-10-21T11:27:59Z</dc:date>
    </item>
    <item>
      <title>Re: How to make delta table column values case-insensitive?</title>
      <link>https://community.databricks.com/t5/data-engineering/how-to-make-delta-table-column-values-case-insensitive/m-p/12778#M7540</link>
      <description>&lt;P&gt;hi @prasad vaze​&amp;nbsp;,&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;By default "spark.sql.caseSensitive" will be false. If you would like to enforce it, then you will need to set it to true. This feature was added since Spark 2.x, you could find more details in this &lt;A href="https://issues.apache.org/jira/browse/SPARK-15229" alt="https://issues.apache.org/jira/browse/SPARK-15229" target="_blank"&gt;Jira&lt;/A&gt; &lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Like @Werner Stinckens​&amp;nbsp;said, you can rename the column names and create a new table with the new cols&lt;/P&gt;</description>
      <pubDate>Thu, 21 Oct 2021 16:33:15 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/how-to-make-delta-table-column-values-case-insensitive/m-p/12778#M7540</guid>
      <dc:creator>jose_gonzalez</dc:creator>
      <dc:date>2021-10-21T16:33:15Z</dc:date>
    </item>
    <item>
      <title>Re: How to make delta table column values case-insensitive?</title>
      <link>https://community.databricks.com/t5/data-engineering/how-to-make-delta-table-column-values-case-insensitive/m-p/12779#M7541</link>
      <description>&lt;P&gt;@Jose Gonzalez​&amp;nbsp; I am solving for case-sensitive values inside the column and not the case-sensitive name of the column.  spark.sql.caseSensitive = Ture makes  columns names case-sensitive and not the value in the column&lt;/P&gt;</description>
      <pubDate>Sat, 30 Oct 2021 14:57:27 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/how-to-make-delta-table-column-values-case-insensitive/m-p/12779#M7541</guid>
      <dc:creator>prasadvaze</dc:creator>
      <dc:date>2021-10-30T14:57:27Z</dc:date>
    </item>
    <item>
      <title>Re: How to make delta table column values case-insensitive?</title>
      <link>https://community.databricks.com/t5/data-engineering/how-to-make-delta-table-column-values-case-insensitive/m-p/12780#M7542</link>
      <description>&lt;P&gt;@Kaniz Fatma​&amp;nbsp; Thanks for reaching out Kaniz.  After much research, I came to the conclusion that this feature is not available in spark or delta lake so this would be a new feature request for spark community.  JSON  and Parquet are case-sensitive so this may be root cause . &lt;/P&gt;</description>
      <pubDate>Sat, 30 Oct 2021 14:59:45 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/how-to-make-delta-table-column-values-case-insensitive/m-p/12780#M7542</guid>
      <dc:creator>prasadvaze</dc:creator>
      <dc:date>2021-10-30T14:59:45Z</dc:date>
    </item>
    <item>
      <title>Re: How to make delta table column values case-insensitive?</title>
      <link>https://community.databricks.com/t5/data-engineering/how-to-make-delta-table-column-values-case-insensitive/m-p/12781#M7543</link>
      <description>&lt;P&gt;To make the comparisons case insensitive we can compare after converting columns to upper or lower case. &lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Eg: &lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;import org.apache.spark.sql.functions._&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;df.filter(lower($"colname") == lower($"colname"))&lt;/P&gt;</description>
      <pubDate>Wed, 10 Nov 2021 19:26:38 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/how-to-make-delta-table-column-values-case-insensitive/m-p/12781#M7543</guid>
      <dc:creator>Sandeep</dc:creator>
      <dc:date>2021-11-10T19:26:38Z</dc:date>
    </item>
    <item>
      <title>Re: How to make delta table column values case-insensitive?</title>
      <link>https://community.databricks.com/t5/data-engineering/how-to-make-delta-table-column-values-case-insensitive/m-p/12782#M7544</link>
      <description>&lt;P&gt;Thank you for the update. Could you mark your response as best answer? it might help for future references. &lt;/P&gt;</description>
      <pubDate>Thu, 18 Nov 2021 17:49:13 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/how-to-make-delta-table-column-values-case-insensitive/m-p/12782#M7544</guid>
      <dc:creator>jose_gonzalez</dc:creator>
      <dc:date>2021-11-18T17:49:13Z</dc:date>
    </item>
    <item>
      <title>Re: How to make delta table column values case-insensitive?</title>
      <link>https://community.databricks.com/t5/data-engineering/how-to-make-delta-table-column-values-case-insensitive/m-p/12783#M7545</link>
      <description>&lt;P&gt;Well, the unintended benefit is now I am using int\big int as surrogate keys&lt;/P&gt;&lt;P&gt;for all tables (preferred in DW).   &lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;All joins are made on integer data types. Query efficiency is also improved.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;The string matching using upper() is done only on ETL when comparing source and target data.&lt;/P&gt;</description>
      <pubDate>Mon, 29 Nov 2021 04:49:48 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/how-to-make-delta-table-column-values-case-insensitive/m-p/12783#M7545</guid>
      <dc:creator>lizou</dc:creator>
      <dc:date>2021-11-29T04:49:48Z</dc:date>
    </item>
  </channel>
</rss>

