Hello @AmarKap ,
When Spark decodes CP1252 bytes as UTF-8/ISO-8859-1, you’ll see the replacement char like �
Can you read the file as :
df = (spark.readStream
.format("cloudFiles")
.option("cloudFiles.format", "text")
.option("encoding", "windows-1252") # or "CP1252"
.load("s3://.../path"))
Anudeep