<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: How to remove more than 4 byte characters using pyspark in databricks? in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/how-to-remove-more-than-4-byte-characters-using-pyspark-in/m-p/27809#M19657</link>
    <description>&lt;P&gt;assuming you are having a string type column in pyspark dataframe, one possible way could be&lt;/P&gt;&lt;OL&gt;&lt;LI&gt;identify total number of characters for each value in column (say &lt;/LI&gt;&lt;LI&gt;identify no of bytes taken by each character (say b)&lt;/LI&gt;&lt;LI&gt;use substring() function to select first n characters where n = floor(4 / b)&lt;/LI&gt;&lt;/OL&gt;</description>
    <pubDate>Tue, 29 Nov 2022 11:00:30 GMT</pubDate>
    <dc:creator>Shalabh007</dc:creator>
    <dc:date>2022-11-29T11:00:30Z</dc:date>
    <item>
      <title>How to remove more than 4 byte characters using pyspark in databricks?</title>
      <link>https://community.databricks.com/t5/data-engineering/how-to-remove-more-than-4-byte-characters-using-pyspark-in/m-p/27808#M19656</link>
      <description>&lt;P&gt;Hi community,&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;We have the need of removing more than 4 byte characters using pyspark in databricks since these are not supported by amazon Redshift. Does someone know how can I accomplish this?&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Thank you very much in advance&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Regards&lt;/P&gt;</description>
      <pubDate>Thu, 17 Feb 2022 04:49:28 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/how-to-remove-more-than-4-byte-characters-using-pyspark-in/m-p/27808#M19656</guid>
      <dc:creator>eimis_pacheco</dc:creator>
      <dc:date>2022-02-17T04:49:28Z</dc:date>
    </item>
    <item>
      <title>Re: How to remove more than 4 byte characters using pyspark in databricks?</title>
      <link>https://community.databricks.com/t5/data-engineering/how-to-remove-more-than-4-byte-characters-using-pyspark-in/m-p/27809#M19657</link>
      <description>&lt;P&gt;assuming you are having a string type column in pyspark dataframe, one possible way could be&lt;/P&gt;&lt;OL&gt;&lt;LI&gt;identify total number of characters for each value in column (say &lt;/LI&gt;&lt;LI&gt;identify no of bytes taken by each character (say b)&lt;/LI&gt;&lt;LI&gt;use substring() function to select first n characters where n = floor(4 / b)&lt;/LI&gt;&lt;/OL&gt;</description>
      <pubDate>Tue, 29 Nov 2022 11:00:30 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/how-to-remove-more-than-4-byte-characters-using-pyspark-in/m-p/27809#M19657</guid>
      <dc:creator>Shalabh007</dc:creator>
      <dc:date>2022-11-29T11:00:30Z</dc:date>
    </item>
  </channel>
</rss>

