<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Reading a csv file in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/reading-a-csv-file/m-p/99133#M39912</link>
    <description>&lt;P&gt;Hey, what's the schema you're referencing? The dates are very inconsistent and unlikely to be loaded in as anything useful. It also looks like the delimiter of a comma is causing you issues as it's also within the body of the text without quotes each time. If this is a csv you want to use for a one off instance, you could export it to a tab delimited file (or other delimiter of your choice) and that should go some way to fixing the issue.&amp;nbsp;&lt;/P&gt;</description>
    <pubDate>Mon, 18 Nov 2024 11:12:24 GMT</pubDate>
    <dc:creator>holly</dc:creator>
    <dc:date>2024-11-18T11:12:24Z</dc:date>
    <item>
      <title>Reading a csv file</title>
      <link>https://community.databricks.com/t5/data-engineering/reading-a-csv-file/m-p/98793#M39842</link>
      <description>&lt;P&gt;while try to read a csv file using data frame , read csv using a&amp;nbsp; file format , but fail in case of formatting and column error while loading&amp;nbsp;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="how the data in databricks ," style="width: 924px;"&gt;&lt;img src="https://community.databricks.com/t5/image/serverpage/image-id/12896iC58E2239105D381E/image-size/large?v=v2&amp;amp;px=999" role="button" title="Screenshot 2024-11-14 172800.png" alt="how the data in databricks ," /&gt;&lt;span class="lia-inline-image-caption" onclick="event.preventDefault();"&gt;how the data in databricks ,&lt;/span&gt;&lt;/span&gt;the code i used for&lt;/P&gt;&lt;DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;df &lt;/SPAN&gt;&lt;SPAN&gt;=&lt;/SPAN&gt;&lt;SPAN&gt; spark.read.&lt;/SPAN&gt;&lt;SPAN&gt;format&lt;/SPAN&gt;&lt;SPAN&gt;(&lt;/SPAN&gt;&lt;SPAN&gt;"csv"&lt;/SPAN&gt;&lt;SPAN&gt;) \&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;&amp;nbsp; &amp;nbsp; .&lt;/SPAN&gt;&lt;SPAN&gt;option&lt;/SPAN&gt;&lt;SPAN&gt;(&lt;/SPAN&gt;&lt;SPAN&gt;"header"&lt;/SPAN&gt;&lt;SPAN&gt;, &lt;/SPAN&gt;&lt;SPAN&gt;"true"&lt;/SPAN&gt;&lt;SPAN&gt;) \&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;&amp;nbsp; &amp;nbsp; .&lt;/SPAN&gt;&lt;SPAN&gt;option&lt;/SPAN&gt;&lt;SPAN&gt;(&lt;/SPAN&gt;&lt;SPAN&gt;"quote"&lt;/SPAN&gt;&lt;SPAN&gt;, &lt;/SPAN&gt;&lt;SPAN&gt;'"'&lt;/SPAN&gt;&lt;SPAN&gt;) \&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;&amp;nbsp; &amp;nbsp; .&lt;/SPAN&gt;&lt;SPAN&gt;option&lt;/SPAN&gt;&lt;SPAN&gt;(&lt;/SPAN&gt;&lt;SPAN&gt;"delimiter"&lt;/SPAN&gt;&lt;SPAN&gt;, &lt;/SPAN&gt;&lt;SPAN&gt;","&lt;/SPAN&gt;&lt;SPAN&gt;) \&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;&amp;nbsp; &amp;nbsp; .&lt;/SPAN&gt;&lt;SPAN&gt;option&lt;/SPAN&gt;&lt;SPAN&gt;(&lt;/SPAN&gt;&lt;SPAN&gt;"nullValue"&lt;/SPAN&gt;&lt;SPAN&gt;, &lt;/SPAN&gt;&lt;SPAN&gt;""&lt;/SPAN&gt;&lt;SPAN&gt;) \&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;&amp;nbsp; &amp;nbsp; .&lt;/SPAN&gt;&lt;SPAN&gt;option&lt;/SPAN&gt;&lt;SPAN&gt;(&lt;/SPAN&gt;&lt;SPAN&gt;"emptyValue"&lt;/SPAN&gt;&lt;SPAN&gt;, &lt;/SPAN&gt;&lt;SPAN&gt;"NULL"&lt;/SPAN&gt;&lt;SPAN&gt;) \&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;&amp;nbsp; &amp;nbsp; .&lt;/SPAN&gt;&lt;SPAN&gt;schema&lt;/SPAN&gt;&lt;SPAN&gt;(schema) \&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;&amp;nbsp; &amp;nbsp; .&lt;/SPAN&gt;&lt;SPAN&gt;load&lt;/SPAN&gt;&lt;SPAN&gt;(&lt;/SPAN&gt;&lt;SPAN&gt;f&lt;/SPAN&gt;&lt;SPAN&gt;"&lt;/SPAN&gt;&lt;SPAN&gt;{&lt;/SPAN&gt;&lt;SPAN&gt;bronze_folder_path&lt;/SPAN&gt;&lt;SPAN&gt;}&lt;/SPAN&gt;&lt;SPAN&gt;/Test.csv"&lt;/SPAN&gt;&lt;SPAN&gt;)&lt;/SPAN&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="this is actually data format" style="width: 832px;"&gt;&lt;img src="https://community.databricks.com/t5/image/serverpage/image-id/12897iAA50B24924118E8A/image-size/large?v=v2&amp;amp;px=999" role="button" title="Screenshot 2024-11-14 172727.png" alt="this is actually data format" /&gt;&lt;span class="lia-inline-image-caption" onclick="event.preventDefault();"&gt;this is actually data format&lt;/span&gt;&lt;/span&gt;&lt;/P&gt;</description>
      <pubDate>Thu, 14 Nov 2024 12:03:25 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/reading-a-csv-file/m-p/98793#M39842</guid>
      <dc:creator>JissMathew</dc:creator>
      <dc:date>2024-11-14T12:03:25Z</dc:date>
    </item>
    <item>
      <title>Re: Reading a csv file</title>
      <link>https://community.databricks.com/t5/data-engineering/reading-a-csv-file/m-p/98799#M39846</link>
      <description>&lt;P&gt;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/132169"&gt;@JissMathew&lt;/a&gt;&amp;nbsp;What is the error that you are getting when trying to load?&lt;/P&gt;</description>
      <pubDate>Thu, 14 Nov 2024 13:12:37 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/reading-a-csv-file/m-p/98799#M39846</guid>
      <dc:creator>MuthuLakshmi</dc:creator>
      <dc:date>2024-11-14T13:12:37Z</dc:date>
    </item>
    <item>
      <title>Re: Reading a csv file</title>
      <link>https://community.databricks.com/t5/data-engineering/reading-a-csv-file/m-p/98900#M39866</link>
      <description>&lt;P&gt;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/89478"&gt;@MuthuLakshmi&lt;/a&gt;&amp;nbsp; actually, In "adreess" column&amp;nbsp; we need&amp;nbsp; "kochi", and column miss match and get into "name" column , that is the error&amp;nbsp;&amp;nbsp;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="Screenshot 2024-11-14 172800.png" style="width: 924px;"&gt;&lt;img src="https://community.databricks.com/t5/image/serverpage/image-id/12930i5CCADB63017D8747/image-size/large?v=v2&amp;amp;px=999" role="button" title="Screenshot 2024-11-14 172800.png" alt="Screenshot 2024-11-14 172800.png" /&gt;&lt;/span&gt;&lt;/P&gt;</description>
      <pubDate>Fri, 15 Nov 2024 10:03:29 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/reading-a-csv-file/m-p/98900#M39866</guid>
      <dc:creator>JissMathew</dc:creator>
      <dc:date>2024-11-15T10:03:29Z</dc:date>
    </item>
    <item>
      <title>Re: Reading a csv file</title>
      <link>https://community.databricks.com/t5/data-engineering/reading-a-csv-file/m-p/98981#M39878</link>
      <description>&lt;P&gt;Hi&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/132169"&gt;@JissMathew&lt;/a&gt;&amp;nbsp;,&lt;BR /&gt;&lt;BR /&gt;Could you also provide sample csv file?&lt;/P&gt;</description>
      <pubDate>Fri, 15 Nov 2024 18:01:48 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/reading-a-csv-file/m-p/98981#M39878</guid>
      <dc:creator>szymon_dybczak</dc:creator>
      <dc:date>2024-11-15T18:01:48Z</dc:date>
    </item>
    <item>
      <title>Re: Reading a csv file</title>
      <link>https://community.databricks.com/t5/data-engineering/reading-a-csv-file/m-p/99072#M39899</link>
      <description>&lt;P&gt;Hi&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/110502"&gt;@szymon_dybczak&lt;/a&gt;&amp;nbsp; have only option to send in png, jpg formats&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Mon, 18 Nov 2024 06:06:15 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/reading-a-csv-file/m-p/99072#M39899</guid>
      <dc:creator>JissMathew</dc:creator>
      <dc:date>2024-11-18T06:06:15Z</dc:date>
    </item>
    <item>
      <title>Re: Reading a csv file</title>
      <link>https://community.databricks.com/t5/data-engineering/reading-a-csv-file/m-p/99133#M39912</link>
      <description>&lt;P&gt;Hey, what's the schema you're referencing? The dates are very inconsistent and unlikely to be loaded in as anything useful. It also looks like the delimiter of a comma is causing you issues as it's also within the body of the text without quotes each time. If this is a csv you want to use for a one off instance, you could export it to a tab delimited file (or other delimiter of your choice) and that should go some way to fixing the issue.&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Mon, 18 Nov 2024 11:12:24 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/reading-a-csv-file/m-p/99133#M39912</guid>
      <dc:creator>holly</dc:creator>
      <dc:date>2024-11-18T11:12:24Z</dc:date>
    </item>
    <item>
      <title>Re: Reading a csv file</title>
      <link>https://community.databricks.com/t5/data-engineering/reading-a-csv-file/m-p/99146#M39915</link>
      <description>&lt;P&gt;hey&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/36301"&gt;@holly&lt;/a&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;actually this&amp;nbsp;&lt;SPAN&gt;&amp;nbsp;.&lt;/SPAN&gt;&lt;SPAN&gt;option&lt;/SPAN&gt;&lt;SPAN&gt;(&lt;/SPAN&gt;&lt;SPAN&gt;"quote"&lt;/SPAN&gt;&lt;SPAN&gt;,&amp;nbsp;&lt;/SPAN&gt;&lt;SPAN&gt;'"'&lt;/SPAN&gt;&lt;SPAN&gt;)&amp;nbsp; option in code should have to fix the error but its not working !, is there any standard file format for csv files ?&lt;/SPAN&gt;&lt;/P&gt;</description>
      <pubDate>Mon, 18 Nov 2024 11:33:08 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/reading-a-csv-file/m-p/99146#M39915</guid>
      <dc:creator>JissMathew</dc:creator>
      <dc:date>2024-11-18T11:33:08Z</dc:date>
    </item>
    <item>
      <title>Re: Reading a csv file</title>
      <link>https://community.databricks.com/t5/data-engineering/reading-a-csv-file/m-p/99214#M39924</link>
      <description>&lt;P&gt;As the "kochi" is in new line, that is causing the issue. Ideally, I would suggest to avoid generating a csv file that has line breaks in a column data. But if you want to handle this scenario, you probably need to put exclusive quotes in your file for each column data so that the line break in a column data are not identified as new row.&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Mon, 18 Nov 2024 16:16:29 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/reading-a-csv-file/m-p/99214#M39924</guid>
      <dc:creator>Lakshay</dc:creator>
      <dc:date>2024-11-18T16:16:29Z</dc:date>
    </item>
    <item>
      <title>Re: Reading a csv file</title>
      <link>https://community.databricks.com/t5/data-engineering/reading-a-csv-file/m-p/99300#M39955</link>
      <description>&lt;P&gt;if there is a option for handle this scenario using a file format for this ? or we have to manually edit in our source file ?&lt;/P&gt;</description>
      <pubDate>Tue, 19 Nov 2024 08:43:20 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/reading-a-csv-file/m-p/99300#M39955</guid>
      <dc:creator>JissMathew</dc:creator>
      <dc:date>2024-11-19T08:43:20Z</dc:date>
    </item>
    <item>
      <title>Re: Reading a csv file</title>
      <link>https://community.databricks.com/t5/data-engineering/reading-a-csv-file/m-p/99302#M39957</link>
      <description>&lt;P&gt;test&lt;/P&gt;</description>
      <pubDate>Tue, 19 Nov 2024 09:16:59 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/reading-a-csv-file/m-p/99302#M39957</guid>
      <dc:creator>gilt</dc:creator>
      <dc:date>2024-11-19T09:16:59Z</dc:date>
    </item>
    <item>
      <title>Re: Reading a csv file</title>
      <link>https://community.databricks.com/t5/data-engineering/reading-a-csv-file/m-p/99334#M39971</link>
      <description>&lt;P&gt;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/128809"&gt;@gilt&lt;/a&gt;&amp;nbsp;test ????&lt;/P&gt;</description>
      <pubDate>Tue, 19 Nov 2024 13:35:49 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/reading-a-csv-file/m-p/99334#M39971</guid>
      <dc:creator>JissMathew</dc:creator>
      <dc:date>2024-11-19T13:35:49Z</dc:date>
    </item>
    <item>
      <title>Re: Reading a csv file</title>
      <link>https://community.databricks.com/t5/data-engineering/reading-a-csv-file/m-p/99580#M40033</link>
      <description>&lt;P&gt;You can try add multiline option:&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;LI-CODE lang="python"&gt;df = (
	spark.read.format("csv")
		.option("header", "true")
		.option("quote", '"')
		.option("delimiter", ",")
		.option("nullValue", "")
		.option("emptyValue", "NULL")
		.option("multiline", True)
		.schema(schema)
		.load(f"{bronze_folder_path}/Test.csv"
)&lt;/LI-CODE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;A href="https://spark.apache.org/docs/3.5.1/sql-data-sources-csv.html" target="_blank" rel="noopener"&gt;https://spark.apache.org/docs/3.5.1/sql-data-sources-csv.html&lt;/A&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN class=""&gt;&lt;SPAN class=""&gt;&lt;SPAN class=""&gt;I also encourage you to use the syntax &lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;LI-CODE lang="python"&gt;df = (
  spark.read
  .some_transformation
) 
rather than 

df=spark.read \
  .some_transformation \ &lt;/LI-CODE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;SPAN class=""&gt;&lt;SPAN class=""&gt;&lt;SPAN class=""&gt;it improves readability and allows you to comment out selected lines&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/P&gt;</description>
      <pubDate>Thu, 21 Nov 2024 07:31:19 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/reading-a-csv-file/m-p/99580#M40033</guid>
      <dc:creator>Mike_Szklarczyk</dc:creator>
      <dc:date>2024-11-21T07:31:19Z</dc:date>
    </item>
    <item>
      <title>Re: Reading a csv file</title>
      <link>https://community.databricks.com/t5/data-engineering/reading-a-csv-file/m-p/99597#M40038</link>
      <description>&lt;P&gt;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/116039"&gt;@Mike_Szklarczyk&lt;/a&gt;&amp;nbsp; Thank you! The issue has been successfully resolved. I sincerely appreciate your guidance and support throughout this process. Your assistance was invaluable. &lt;span class="lia-unicode-emoji" title=":smiling_face_with_smiling_eyes:"&gt;😊&lt;/span&gt;&lt;/P&gt;</description>
      <pubDate>Thu, 21 Nov 2024 10:14:40 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/reading-a-csv-file/m-p/99597#M40038</guid>
      <dc:creator>JissMathew</dc:creator>
      <dc:date>2024-11-21T10:14:40Z</dc:date>
    </item>
  </channel>
</rss>

