<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic What is the difference between passing the schema in the options  or using the .schema() function in pyspark for a csv file? in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/what-is-the-difference-between-passing-the-schema-in-the-options/m-p/31700#M23085</link>
    <description>&lt;P&gt;I have observed a very strange behavior with some of our integration pipelines.  This week one of the  csv files was getting broken when read with read function given below.&lt;/P&gt;&lt;PRE&gt;&lt;CODE&gt;def ReadCSV(files,schema_struct,header,delimiter,timestampformat,encode="utf8",multiLine="true"):
  deltas_df = spark.read \
      .format('csv') \
      .options(header=header, delimiter=delimiter, timestampFormat=timestampformat,enoding=encode,multiLine=multiLine) \
      .schema(schema=schema_struct).load(files)  
  return df&lt;/CODE&gt;&lt;/PRE&gt;&lt;P&gt;I made changes and moved the schema in the options. This worked and was able to read the file for that object. But it started failing for the other objects.  So i am wondering why would it behave so differently.&lt;/P&gt;&lt;PRE&gt;&lt;CODE&gt;def ReadCSV2(files,schema_struct,header,delimiter,timestampformat,encode="utf8"):
  deltas_df = spark.read \
      .format('csv') \
      .options(header=header, delimiter=delimiter, timestampFormat=timestampformat,enoding=encode,multiLine="true",schema=schema_struct) \
      .load(files)  
  return df&lt;/CODE&gt;&lt;/PRE&gt;&lt;P&gt;I would like to keep one function and solve this issue. For now i have to use two functions.&lt;/P&gt;</description>
    <pubDate>Thu, 13 Jan 2022 12:39:22 GMT</pubDate>
    <dc:creator>irfanaziz</dc:creator>
    <dc:date>2022-01-13T12:39:22Z</dc:date>
    <item>
      <title>What is the difference between passing the schema in the options  or using the .schema() function in pyspark for a csv file?</title>
      <link>https://community.databricks.com/t5/data-engineering/what-is-the-difference-between-passing-the-schema-in-the-options/m-p/31700#M23085</link>
      <description>&lt;P&gt;I have observed a very strange behavior with some of our integration pipelines.  This week one of the  csv files was getting broken when read with read function given below.&lt;/P&gt;&lt;PRE&gt;&lt;CODE&gt;def ReadCSV(files,schema_struct,header,delimiter,timestampformat,encode="utf8",multiLine="true"):
  deltas_df = spark.read \
      .format('csv') \
      .options(header=header, delimiter=delimiter, timestampFormat=timestampformat,enoding=encode,multiLine=multiLine) \
      .schema(schema=schema_struct).load(files)  
  return df&lt;/CODE&gt;&lt;/PRE&gt;&lt;P&gt;I made changes and moved the schema in the options. This worked and was able to read the file for that object. But it started failing for the other objects.  So i am wondering why would it behave so differently.&lt;/P&gt;&lt;PRE&gt;&lt;CODE&gt;def ReadCSV2(files,schema_struct,header,delimiter,timestampformat,encode="utf8"):
  deltas_df = spark.read \
      .format('csv') \
      .options(header=header, delimiter=delimiter, timestampFormat=timestampformat,enoding=encode,multiLine="true",schema=schema_struct) \
      .load(files)  
  return df&lt;/CODE&gt;&lt;/PRE&gt;&lt;P&gt;I would like to keep one function and solve this issue. For now i have to use two functions.&lt;/P&gt;</description>
      <pubDate>Thu, 13 Jan 2022 12:39:22 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/what-is-the-difference-between-passing-the-schema-in-the-options/m-p/31700#M23085</guid>
      <dc:creator>irfanaziz</dc:creator>
      <dc:date>2022-01-13T12:39:22Z</dc:date>
    </item>
    <item>
      <title>Re: What is the difference between passing the schema in the options  or using the .schema() function in pyspark for a csv file?</title>
      <link>https://community.databricks.com/t5/data-engineering/what-is-the-difference-between-passing-the-schema-in-the-options/m-p/31701#M23086</link>
      <description>&lt;P&gt;Hello @nafri A​&amp;nbsp;- My name is Piper, and I'm a moderator for Databricks. Welcome to the community and thank you for your question. I'm sorry to hear you're having trouble. We'll give the community a chance to respond before we circle back around to this. Thanks in advance for your patience.&lt;/P&gt;</description>
      <pubDate>Thu, 13 Jan 2022 17:49:49 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/what-is-the-difference-between-passing-the-schema-in-the-options/m-p/31701#M23086</guid>
      <dc:creator>Anonymous</dc:creator>
      <dc:date>2022-01-13T17:49:49Z</dc:date>
    </item>
    <item>
      <title>Re: What is the difference between passing the schema in the options  or using the .schema() function in pyspark for a csv file?</title>
      <link>https://community.databricks.com/t5/data-engineering/what-is-the-difference-between-passing-the-schema-in-the-options/m-p/31702#M23087</link>
      <description>&lt;P&gt;How exactly failing?&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Maybe there are differences in csv header including casesensivity so enforceSchema = False could maybe help.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Regarding schema under the hood it points to the same scala function.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;</description>
      <pubDate>Fri, 14 Jan 2022 14:32:20 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/what-is-the-difference-between-passing-the-schema-in-the-options/m-p/31702#M23087</guid>
      <dc:creator>Hubert-Dudek</dc:creator>
      <dc:date>2022-01-14T14:32:20Z</dc:date>
    </item>
    <item>
      <title>Re: What is the difference between passing the schema in the options  or using the .schema() function in pyspark for a csv file?</title>
      <link>https://community.databricks.com/t5/data-engineering/what-is-the-difference-between-passing-the-schema-in-the-options/m-p/31703#M23088</link>
      <description>&lt;P&gt;Hi @nafri A​&amp;nbsp;,&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;What is the error you are getting, can you share it please? Like @Hubert Dudek​&amp;nbsp;mentioned, both will call the same APIs&lt;/P&gt;</description>
      <pubDate>Wed, 09 Feb 2022 00:41:55 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/what-is-the-difference-between-passing-the-schema-in-the-options/m-p/31703#M23088</guid>
      <dc:creator>jose_gonzalez</dc:creator>
      <dc:date>2022-02-09T00:41:55Z</dc:date>
    </item>
  </channel>
</rss>

