DataFrame to CSV write has issues due to multiple commas inside an row value

sai_sathya
New Contributor III

Hi all

iam working on a data containing JSON fields with embedded commas into CSV format. iam facing challenges due to the commas 

within the JSON being misinterpreted as column delimiters during the conversion process.

i tried several methods to modify data and tried to escape the comma comes under the row value but it doesent works at the same time i should make sure the JSON code has to be in correct syntax as that will be used in some other place within the project 

the sample pyspark code and data that i worked with 

 

 

 

 

 

 

from pyspark.sql import SparkSession
from pyspark.sql.types import StructType, StructField, StringType


schema = StructType([
    StructField("IDcol1", IntegerType(), True),
    StructField("IDcol2", IntegerType(), True),
    StructField("AdditionalRequestParameters", StringType(), True),
    StructField("RequestURL", StringType(), True)
])

data = [
    (1, 2, "{'Locale':'en','KnowledgeType':[{'Name':'IndustryKnowledge'},{'Name':'KnowledgeIndustry'}],'SegmentCountry':[{'Country':'US','IndustrySegment':'IP'}],'Setversion':'VersionValue','Flags':null,'ScalingID':0,'VersionInfo':{'PracticeSubType':'SubPracticeValue','Version':'VersionValue'}}", 'https://abc/something/API/2021v3/nothing/data'),
    
]

df = spark.createDataFrame(data, schema=schema)
df.display()

 

 

 

 

 

 

 by displaying as an dataframe its no doubt it works fine 

sai_sathya_0-1712850570456.png

and that is how the expected data should be but while writing it into an CSV file in my ADLS it misbehaves and creates an new column for every comma that comes under the JSON column

anyway iam unable to read the csv data i tried display() and show() and when i look into the csv file that generated from the container this is what i was able to find 

sai_sathya_1-1712850991923.png

please help me how to handle this commas . Thnaks