i am running spark 2.4.4 with python 2.7 and IDE is pycharm.
The Input file (.csv) contain encoded value in some column like given below.
File data looks
COL1,COL2,COL3,COL4
CM, 503004, (d$όνυ$F|'.h*Λ!ψμ=(.ξ; ,.ʽ|!3-2-704
The output i am trying to get is
CM,503004,,3-2-704 ---- all encoded and ascii value removed.
code i tried :
from pyspark.sql import SparkSession spark = SparkSession.builder.appName("Python Spark").getOrCreate() df = spark.read.csv("filepath\Customers_v01.csv",header=True,sep=","); myres = df.rdd.map(lambda x: x[1].encode().decode('utf-8')) print(myres.collect())
but this is giving only
503004 -- printing only col2 value.
Please share your suggestion , is it possible to fix the issue in pyspark.
Thanks a lot