topic Re: Lakeflow Pipelines Trying to Read accented file with spark.readStream but failure in Data Engineering

Lakeflow Pipelines Trying to Read accented file with spark.readStream but failure

AmarKap — Mon, 20 Oct 2025 18:05:21 GMT

Trying to read a accented file(French characters) but the spark.readStream function is not working and special characters turn into something strange(ex. �)

spark.readStream

.format("cloudfiles")

.option("cloudFiles.format", "text")

.option("encoding", "ISO-8859-1")

Tried both ISO-8859-1 and UTF-8.
Tried with and without .option("cloudFiles.format", "text")
Files do not contains .txt extension

Re: Lakeflow Pipelines Trying to Read accented file with spark.readStream but failure

K_Anudeep — Tue, 21 Oct 2025 07:38:13 GMT

Hello @AmarKap ,

When Spark decodes CP1252 bytes as UTF-8/ISO-8859-1, you’ll see the replacement char like �

Can you read the file as :

df = (spark.readStream
.format("cloudFiles")
.option("cloudFiles.format", "text")
.option("encoding", "windows-1252") # or "CP1252"
.load("s3://.../path"))