Data ingestion issue with THAI data
Options
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
a month ago
I have a use case where my file has data in Thai characters. The source location is azure blob storage, here files are stored in text format. I am using the following code to read the file, but when I am downloading the data from catalog it encloses data in quotes which I don't want.
input_df = (
spark.read.format("text")
.option("ignoreLeadingWhiteSpace", "false")
.option("ignoreTrailingWhiteSpace", "false")
.option("encoding", encoding)
.option("keepUndefinedRows", True)
.load(file_path)
.withColumn("decoded_text", expr(f"regexp_replace(decode(value, '{encoding}'), '^\"|\"$', '')"))
.drop("value")
.withColumnRenamed("decoded_text", "value")
)
Labels:
- Labels:
-
Delta Lake
-
Spark
-
Workflows
1 REPLY 1
Options
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
a month ago
Do the quotes exist in original data?

