you can use pyspark to remove the unicodes. 

This example removes null unicode. You will have to search or match yours or you can find some solution on google.

# Change null to empty in DataFrame
null = u'\u0000'
dfCnae = df\
.withColumn('id', regexp_replace(df['id'], null, ''))\
.withColumn('description', regexp_replace(df['description'], null, ''))