- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
01-13-2022 04:39 AM
I have observed a very strange behavior with some of our integration pipelines. This week one of the csv files was getting broken when read with read function given below.
def ReadCSV(files,schema_struct,header,delimiter,timestampformat,encode="utf8",multiLine="true"):
deltas_df = spark.read \
.format('csv') \
.options(header=header, delimiter=delimiter, timestampFormat=timestampformat,enoding=encode,multiLine=multiLine) \
.schema(schema=schema_struct).load(files)
return df
I made changes and moved the schema in the options. This worked and was able to read the file for that object. But it started failing for the other objects. So i am wondering why would it behave so differently.
def ReadCSV2(files,schema_struct,header,delimiter,timestampformat,encode="utf8"):
deltas_df = spark.read \
.format('csv') \
.options(header=header, delimiter=delimiter, timestampFormat=timestampformat,enoding=encode,multiLine="true",schema=schema_struct) \
.load(files)
return df
I would like to keep one function and solve this issue. For now i have to use two functions.
Accepted Solutions
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
01-14-2022 06:32 AM
How exactly failing?
Maybe there are differences in csv header including casesensivity so enforceSchema = False could maybe help.
Regarding schema under the hood it points to the same scala function.
![](/skins/images/F150478535D6FB5A5FF0311D4528FC89/responsive_peak/images/icon_anonymous_profile.png)
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
01-13-2022 09:49 AM
Hello @nafri A - My name is Piper, and I'm a moderator for Databricks. Welcome to the community and thank you for your question. I'm sorry to hear you're having trouble. We'll give the community a chance to respond before we circle back around to this. Thanks in advance for your patience.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
01-14-2022 06:32 AM
How exactly failing?
Maybe there are differences in csv header including casesensivity so enforceSchema = False could maybe help.
Regarding schema under the hood it points to the same scala function.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
02-08-2022 04:41 PM
Hi @nafri A ,
What is the error you are getting, can you share it please? Like @Hubert Dudek mentioned, both will call the same APIs
![](/skins/images/582998B45490C7019731A5B3A872C751/responsive_peak/images/icon_anonymous_message.png)
![](/skins/images/582998B45490C7019731A5B3A872C751/responsive_peak/images/icon_anonymous_message.png)