Shalabh007
Honored Contributor

assuming you are having a string type column in pyspark dataframe, one possible way could be

  1. identify total number of characters for each value in column (say
  2. identify no of bytes taken by each character (say b)
  3. use substring() function to select first n characters where n = floor(4 / b)