<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Partially upload data of  1.2GB in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/partially-upload-data-of-1-2gb/m-p/80090#M35922</link>
    <description>&lt;P&gt;It has structured data :&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;PAT_KEY|ICD_VERSION|ICD_CODE|ICD_PRI_SEC|ICD_POA|&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;|516351692|10|M12.123|A|Y &lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;Regards&lt;/P&gt;&lt;P&gt;Rohit&lt;/P&gt;</description>
    <pubDate>Tue, 23 Jul 2024 08:09:22 GMT</pubDate>
    <dc:creator>RohitKulkarni</dc:creator>
    <dc:date>2024-07-23T08:09:22Z</dc:date>
    <item>
      <title>Partially upload data of  1.2GB</title>
      <link>https://community.databricks.com/t5/data-engineering/partially-upload-data-of-1-2gb/m-p/80063#M35911</link>
      <description>&lt;P&gt;Hello Team,&lt;/P&gt;&lt;P&gt;I have file contain in txt format of 1.2gb file.I am trying to upload the data into ms sql server database table.Only 10% of the data able to upload it.&lt;/P&gt;&lt;P&gt;example :&lt;/P&gt;&lt;P&gt;Total records in a file :&amp;nbsp;&lt;SPAN&gt;51303483&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;Number of records inserted :&lt;SPAN&gt;10224430&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;I am using pyspark in databricks.But in the file there is no junk and blank space are present.But still not able to load 100% data.&lt;/P&gt;&lt;P&gt;Please advise any workaround.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Regards&lt;/P&gt;&lt;P&gt;Rohit&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Tue, 23 Jul 2024 05:42:22 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/partially-upload-data-of-1-2gb/m-p/80063#M35911</guid>
      <dc:creator>RohitKulkarni</dc:creator>
      <dc:date>2024-07-23T05:42:22Z</dc:date>
    </item>
    <item>
      <title>Re: Partially upload data of  1.2GB</title>
      <link>https://community.databricks.com/t5/data-engineering/partially-upload-data-of-1-2gb/m-p/80075#M35913</link>
      <description>&lt;BLOCKQUOTE&gt;&lt;HR /&gt;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/7159"&gt;@RohitKulkarni&lt;/a&gt;&amp;nbsp;wrote:&lt;BR /&gt;&lt;P&gt;&lt;SPAN&gt;But still not able to load 100% data&lt;/SPAN&gt;&lt;/P&gt;&lt;HR /&gt;&lt;/BLOCKQUOTE&gt;&lt;P&gt;&amp;nbsp;What do you mean by that? Do you get any error messages or so? How do you try to read this file in detail?&lt;/P&gt;</description>
      <pubDate>Tue, 23 Jul 2024 07:03:52 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/partially-upload-data-of-1-2gb/m-p/80075#M35913</guid>
      <dc:creator>Witold</dc:creator>
      <dc:date>2024-07-23T07:03:52Z</dc:date>
    </item>
    <item>
      <title>Re: Partially upload data of  1.2GB</title>
      <link>https://community.databricks.com/t5/data-engineering/partially-upload-data-of-1-2gb/m-p/80077#M35914</link>
      <description>&lt;P&gt;I have read the file with below script&amp;nbsp;&lt;/P&gt;&lt;DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;import&lt;/SPAN&gt;&lt;SPAN&gt; os&lt;/SPAN&gt;&lt;/DIV&gt;&lt;BR /&gt;&lt;DIV&gt;&lt;SPAN&gt;def&lt;/SPAN&gt; &lt;SPAN&gt;split_file_by_size&lt;/SPAN&gt;&lt;SPAN&gt;(&lt;/SPAN&gt;&lt;SPAN&gt;file_path&lt;/SPAN&gt;&lt;SPAN&gt;, &lt;/SPAN&gt;&lt;SPAN&gt;chunk_size_mb&lt;/SPAN&gt;&lt;SPAN&gt;=&lt;/SPAN&gt;&lt;SPAN&gt;500&lt;/SPAN&gt;&lt;SPAN&gt;&lt;span class="lia-unicode-emoji" title=":disappointed_face:"&gt;😞&lt;/span&gt;&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;&amp;nbsp; &amp;nbsp; base_name, ext &lt;/SPAN&gt;&lt;SPAN&gt;=&lt;/SPAN&gt;&lt;SPAN&gt; os.path.&lt;/SPAN&gt;&lt;SPAN&gt;splitext&lt;/SPAN&gt;&lt;SPAN&gt;(file_path)&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;&amp;nbsp; &amp;nbsp; chunk_size &lt;/SPAN&gt;&lt;SPAN&gt;=&lt;/SPAN&gt;&lt;SPAN&gt; chunk_size_mb &lt;/SPAN&gt;&lt;SPAN&gt;*&lt;/SPAN&gt; &lt;SPAN&gt;1024&lt;/SPAN&gt; &lt;SPAN&gt;*&lt;/SPAN&gt; &lt;SPAN&gt;1024&lt;/SPAN&gt;&lt;SPAN&gt; &amp;nbsp;&lt;/SPAN&gt;&lt;SPAN&gt;# Convert MB to bytes&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;&amp;nbsp; &amp;nbsp; part_number &lt;/SPAN&gt;&lt;SPAN&gt;=&lt;/SPAN&gt; &lt;SPAN&gt;1&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;&amp;nbsp; &amp;nbsp; buffer &lt;/SPAN&gt;&lt;SPAN&gt;=&lt;/SPAN&gt; &lt;SPAN&gt;''&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;&amp;nbsp; &amp;nbsp; total_records &lt;/SPAN&gt;&lt;SPAN&gt;=&lt;/SPAN&gt; &lt;SPAN&gt;0&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;&amp;nbsp; &amp;nbsp; file_records &lt;/SPAN&gt;&lt;SPAN&gt;=&lt;/SPAN&gt;&lt;SPAN&gt; {}&lt;/SPAN&gt;&lt;/DIV&gt;&lt;BR /&gt;&lt;DIV&gt;&lt;SPAN&gt;&amp;nbsp; &amp;nbsp; &lt;/SPAN&gt;&lt;SPAN&gt;with&lt;/SPAN&gt; &lt;SPAN&gt;open&lt;/SPAN&gt;&lt;SPAN&gt;(file_path, &lt;/SPAN&gt;&lt;SPAN&gt;'r'&lt;/SPAN&gt;&lt;SPAN&gt;) &lt;/SPAN&gt;&lt;SPAN&gt;as&lt;/SPAN&gt;&lt;SPAN&gt; infile:&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &lt;/SPAN&gt;&lt;SPAN&gt;while&lt;/SPAN&gt; &lt;SPAN&gt;True&lt;/SPAN&gt;&lt;SPAN&gt;:&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; chunk &lt;/SPAN&gt;&lt;SPAN&gt;=&lt;/SPAN&gt;&lt;SPAN&gt; infile.&lt;/SPAN&gt;&lt;SPAN&gt;readlines&lt;/SPAN&gt;&lt;SPAN&gt;(chunk_size)&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &lt;/SPAN&gt;&lt;SPAN&gt;if&lt;/SPAN&gt; &lt;SPAN&gt;not&lt;/SPAN&gt;&lt;SPAN&gt; chunk:&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &lt;/SPAN&gt;&lt;SPAN&gt;if&lt;/SPAN&gt;&lt;SPAN&gt; buffer:&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; part_file_name &lt;/SPAN&gt;&lt;SPAN&gt;=&lt;/SPAN&gt; &lt;SPAN&gt;f&lt;/SPAN&gt;&lt;SPAN&gt;"&lt;/SPAN&gt;&lt;SPAN&gt;{&lt;/SPAN&gt;&lt;SPAN&gt;base_name&lt;/SPAN&gt;&lt;SPAN&gt;}&lt;/SPAN&gt;&lt;SPAN&gt;.part&lt;/SPAN&gt;&lt;SPAN&gt;{&lt;/SPAN&gt;&lt;SPAN&gt;part_number&lt;/SPAN&gt;&lt;SPAN&gt;}{&lt;/SPAN&gt;&lt;SPAN&gt;ext&lt;/SPAN&gt;&lt;SPAN&gt;}&lt;/SPAN&gt;&lt;SPAN&gt;"&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; record_count &lt;/SPAN&gt;&lt;SPAN&gt;=&lt;/SPAN&gt; &lt;SPAN&gt;write_and_count&lt;/SPAN&gt;&lt;SPAN&gt;(buffer, part_file_name)&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &lt;/SPAN&gt;&lt;SPAN&gt;if&lt;/SPAN&gt;&lt;SPAN&gt; record_count &lt;/SPAN&gt;&lt;SPAN&gt;&amp;gt;&lt;/SPAN&gt; &lt;SPAN&gt;0&lt;/SPAN&gt;&lt;SPAN&gt;:&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; file_records[part_file_name] &lt;/SPAN&gt;&lt;SPAN&gt;=&lt;/SPAN&gt;&lt;SPAN&gt; record_count&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &lt;/SPAN&gt;&lt;SPAN&gt;break&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; buffer &lt;/SPAN&gt;&lt;SPAN&gt;+=&lt;/SPAN&gt; &lt;SPAN&gt;''&lt;/SPAN&gt;&lt;SPAN&gt;.&lt;/SPAN&gt;&lt;SPAN&gt;join&lt;/SPAN&gt;&lt;SPAN&gt;(chunk)&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; lines &lt;/SPAN&gt;&lt;SPAN&gt;=&lt;/SPAN&gt;&lt;SPAN&gt; buffer.&lt;/SPAN&gt;&lt;SPAN&gt;splitlines&lt;/SPAN&gt;&lt;SPAN&gt;()&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &lt;/SPAN&gt;&lt;SPAN&gt;if&lt;/SPAN&gt; &lt;SPAN&gt;len&lt;/SPAN&gt;&lt;SPAN&gt;(buffer) &lt;/SPAN&gt;&lt;SPAN&gt;&amp;gt;=&lt;/SPAN&gt;&lt;SPAN&gt; chunk_size:&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &lt;/SPAN&gt;&lt;SPAN&gt;# Find the last complete record&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; partial_chunk &lt;/SPAN&gt;&lt;SPAN&gt;=&lt;/SPAN&gt; &lt;SPAN&gt;''&lt;/SPAN&gt;&lt;SPAN&gt;.&lt;/SPAN&gt;&lt;SPAN&gt;join&lt;/SPAN&gt;&lt;SPAN&gt;(chunk)&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &lt;/SPAN&gt;&lt;SPAN&gt;if&lt;/SPAN&gt; &lt;SPAN&gt;'&lt;/SPAN&gt;&lt;SPAN&gt;\n&lt;/SPAN&gt;&lt;SPAN&gt;'&lt;/SPAN&gt; &lt;SPAN&gt;in&lt;/SPAN&gt;&lt;SPAN&gt; partial_chunk:&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; last_complete_line &lt;/SPAN&gt;&lt;SPAN&gt;=&lt;/SPAN&gt;&lt;SPAN&gt; partial_chunk.&lt;/SPAN&gt;&lt;SPAN&gt;rfind&lt;/SPAN&gt;&lt;SPAN&gt;(&lt;/SPAN&gt;&lt;SPAN&gt;'&lt;/SPAN&gt;&lt;SPAN&gt;\n&lt;/SPAN&gt;&lt;SPAN&gt;'&lt;/SPAN&gt;&lt;SPAN&gt;) &lt;/SPAN&gt;&lt;SPAN&gt;+&lt;/SPAN&gt; &lt;SPAN&gt;1&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; part_file_name &lt;/SPAN&gt;&lt;SPAN&gt;=&lt;/SPAN&gt; &lt;SPAN&gt;f&lt;/SPAN&gt;&lt;SPAN&gt;"&lt;/SPAN&gt;&lt;SPAN&gt;{&lt;/SPAN&gt;&lt;SPAN&gt;base_name&lt;/SPAN&gt;&lt;SPAN&gt;}&lt;/SPAN&gt;&lt;SPAN&gt;.part&lt;/SPAN&gt;&lt;SPAN&gt;{&lt;/SPAN&gt;&lt;SPAN&gt;part_number&lt;/SPAN&gt;&lt;SPAN&gt;}{&lt;/SPAN&gt;&lt;SPAN&gt;ext&lt;/SPAN&gt;&lt;SPAN&gt;}&lt;/SPAN&gt;&lt;SPAN&gt;"&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; current_chunk &lt;/SPAN&gt;&lt;SPAN&gt;=&lt;/SPAN&gt;&lt;SPAN&gt; buffer[:last_complete_line]&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; record_count &lt;/SPAN&gt;&lt;SPAN&gt;=&lt;/SPAN&gt; &lt;SPAN&gt;write_and_count&lt;/SPAN&gt;&lt;SPAN&gt;(current_chunk, part_file_name)&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &lt;/SPAN&gt;&lt;SPAN&gt;if&lt;/SPAN&gt;&lt;SPAN&gt; record_count &lt;/SPAN&gt;&lt;SPAN&gt;&amp;gt;&lt;/SPAN&gt; &lt;SPAN&gt;0&lt;/SPAN&gt;&lt;SPAN&gt;:&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; file_records[part_file_name] &lt;/SPAN&gt;&lt;SPAN&gt;=&lt;/SPAN&gt;&lt;SPAN&gt; record_count&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; buffer &lt;/SPAN&gt;&lt;SPAN&gt;=&lt;/SPAN&gt;&lt;SPAN&gt; buffer[last_complete_line:]&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; part_number &lt;/SPAN&gt;&lt;SPAN&gt;+=&lt;/SPAN&gt; &lt;SPAN&gt;1&lt;/SPAN&gt;&lt;/DIV&gt;&lt;BR /&gt;&lt;DIV&gt;&lt;SPAN&gt;&amp;nbsp; &amp;nbsp; &lt;/SPAN&gt;&lt;SPAN&gt;if&lt;/SPAN&gt;&lt;SPAN&gt; buffer:&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; part_file_name &lt;/SPAN&gt;&lt;SPAN&gt;=&lt;/SPAN&gt; &lt;SPAN&gt;f&lt;/SPAN&gt;&lt;SPAN&gt;"&lt;/SPAN&gt;&lt;SPAN&gt;{&lt;/SPAN&gt;&lt;SPAN&gt;base_name&lt;/SPAN&gt;&lt;SPAN&gt;}&lt;/SPAN&gt;&lt;SPAN&gt;.part&lt;/SPAN&gt;&lt;SPAN&gt;{&lt;/SPAN&gt;&lt;SPAN&gt;part_number&lt;/SPAN&gt;&lt;SPAN&gt;}{&lt;/SPAN&gt;&lt;SPAN&gt;ext&lt;/SPAN&gt;&lt;SPAN&gt;}&lt;/SPAN&gt;&lt;SPAN&gt;"&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; record_count &lt;/SPAN&gt;&lt;SPAN&gt;=&lt;/SPAN&gt; &lt;SPAN&gt;write_and_count&lt;/SPAN&gt;&lt;SPAN&gt;(buffer, part_file_name)&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &lt;/SPAN&gt;&lt;SPAN&gt;if&lt;/SPAN&gt;&lt;SPAN&gt; record_count &lt;/SPAN&gt;&lt;SPAN&gt;&amp;gt;&lt;/SPAN&gt; &lt;SPAN&gt;0&lt;/SPAN&gt;&lt;SPAN&gt;:&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; file_records[part_file_name] &lt;/SPAN&gt;&lt;SPAN&gt;=&lt;/SPAN&gt;&lt;SPAN&gt; record_count&lt;/SPAN&gt;&lt;/DIV&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;DIV&gt;&lt;SPAN&gt;&amp;nbsp; &amp;nbsp; total_records &lt;/SPAN&gt;&lt;SPAN&gt;=&lt;/SPAN&gt; &lt;SPAN&gt;sum&lt;/SPAN&gt;&lt;SPAN&gt;(file_records.&lt;/SPAN&gt;&lt;SPAN&gt;values&lt;/SPAN&gt;&lt;SPAN&gt;())&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;&amp;nbsp; &amp;nbsp; &lt;/SPAN&gt;&lt;SPAN&gt;for&lt;/SPAN&gt;&lt;SPAN&gt; file_name, count &lt;/SPAN&gt;&lt;SPAN&gt;in&lt;/SPAN&gt;&lt;SPAN&gt; file_records.&lt;/SPAN&gt;&lt;SPAN&gt;items&lt;/SPAN&gt;&lt;SPAN&gt;():&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &lt;/SPAN&gt;&lt;SPAN&gt;print&lt;/SPAN&gt;&lt;SPAN&gt;(&lt;/SPAN&gt;&lt;SPAN&gt;f&lt;/SPAN&gt;&lt;SPAN&gt;"&lt;/SPAN&gt;&lt;SPAN&gt;{&lt;/SPAN&gt;&lt;SPAN&gt;file_name&lt;/SPAN&gt;&lt;SPAN&gt;}&lt;/SPAN&gt;&lt;SPAN&gt;: &lt;/SPAN&gt;&lt;SPAN&gt;{&lt;/SPAN&gt;&lt;SPAN&gt;count&lt;/SPAN&gt;&lt;SPAN&gt;}&lt;/SPAN&gt;&lt;SPAN&gt; records"&lt;/SPAN&gt;&lt;SPAN&gt;)&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;&amp;nbsp; &amp;nbsp; &lt;/SPAN&gt;&lt;SPAN&gt;print&lt;/SPAN&gt;&lt;SPAN&gt;(&lt;/SPAN&gt;&lt;SPAN&gt;f&lt;/SPAN&gt;&lt;SPAN&gt;"Total records: &lt;/SPAN&gt;&lt;SPAN&gt;{&lt;/SPAN&gt;&lt;SPAN&gt;total_records&lt;/SPAN&gt;&lt;SPAN&gt;}&lt;/SPAN&gt;&lt;SPAN&gt;"&lt;/SPAN&gt;&lt;SPAN&gt;)&lt;/SPAN&gt;&lt;/DIV&gt;&lt;BR /&gt;&lt;DIV&gt;&lt;SPAN&gt;def&lt;/SPAN&gt; &lt;SPAN&gt;write_and_count&lt;/SPAN&gt;&lt;SPAN&gt;(&lt;/SPAN&gt;&lt;SPAN&gt;data&lt;/SPAN&gt;&lt;SPAN&gt;, &lt;/SPAN&gt;&lt;SPAN&gt;file_path&lt;/SPAN&gt;&lt;SPAN&gt;&lt;span class="lia-unicode-emoji" title=":disappointed_face:"&gt;😞&lt;/span&gt;&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;&amp;nbsp; &amp;nbsp; &lt;/SPAN&gt;&lt;SPAN&gt;# Only write if data is not empty&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;&amp;nbsp; &amp;nbsp; &lt;/SPAN&gt;&lt;SPAN&gt;if&lt;/SPAN&gt;&lt;SPAN&gt; data.&lt;/SPAN&gt;&lt;SPAN&gt;strip&lt;/SPAN&gt;&lt;SPAN&gt;():&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &lt;/SPAN&gt;&lt;SPAN&gt;with&lt;/SPAN&gt; &lt;SPAN&gt;open&lt;/SPAN&gt;&lt;SPAN&gt;(file_path, &lt;/SPAN&gt;&lt;SPAN&gt;'w'&lt;/SPAN&gt;&lt;SPAN&gt;) &lt;/SPAN&gt;&lt;SPAN&gt;as&lt;/SPAN&gt;&lt;SPAN&gt; outfile:&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; outfile.&lt;/SPAN&gt;&lt;SPAN&gt;write&lt;/SPAN&gt;&lt;SPAN&gt;(data)&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &lt;/SPAN&gt;&lt;SPAN&gt;return&lt;/SPAN&gt; &lt;SPAN&gt;count_records&lt;/SPAN&gt;&lt;SPAN&gt;(file_path)&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;&amp;nbsp; &amp;nbsp; &lt;/SPAN&gt;&lt;SPAN&gt;return&lt;/SPAN&gt; &lt;SPAN&gt;0&lt;/SPAN&gt;&lt;/DIV&gt;&lt;BR /&gt;&lt;DIV&gt;&lt;SPAN&gt;def&lt;/SPAN&gt; &lt;SPAN&gt;count_records&lt;/SPAN&gt;&lt;SPAN&gt;(&lt;/SPAN&gt;&lt;SPAN&gt;file_path&lt;/SPAN&gt;&lt;SPAN&gt;&lt;span class="lia-unicode-emoji" title=":disappointed_face:"&gt;😞&lt;/span&gt;&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;&amp;nbsp; &amp;nbsp; &lt;/SPAN&gt;&lt;SPAN&gt;with&lt;/SPAN&gt; &lt;SPAN&gt;open&lt;/SPAN&gt;&lt;SPAN&gt;(file_path, &lt;/SPAN&gt;&lt;SPAN&gt;'r'&lt;/SPAN&gt;&lt;SPAN&gt;) &lt;/SPAN&gt;&lt;SPAN&gt;as&lt;/SPAN&gt;&lt;SPAN&gt; infile:&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &lt;/SPAN&gt;&lt;SPAN&gt;return&lt;/SPAN&gt; &lt;SPAN&gt;sum&lt;/SPAN&gt;&lt;SPAN&gt;(&lt;/SPAN&gt;&lt;SPAN&gt;1&lt;/SPAN&gt; &lt;SPAN&gt;for&lt;/SPAN&gt;&lt;SPAN&gt; line &lt;/SPAN&gt;&lt;SPAN&gt;in&lt;/SPAN&gt;&lt;SPAN&gt; infile &lt;/SPAN&gt;&lt;SPAN&gt;if&lt;/SPAN&gt;&lt;SPAN&gt; line.&lt;/SPAN&gt;&lt;SPAN&gt;strip&lt;/SPAN&gt;&lt;SPAN&gt;()) &amp;nbsp;&lt;/SPAN&gt;&lt;SPAN&gt;# Count non-empty lines&lt;/SPAN&gt;&lt;/DIV&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;DIV&gt;&lt;SPAN&gt;container_name &lt;/SPAN&gt;&lt;SPAN&gt;=&lt;/SPAN&gt; &lt;SPAN&gt;'marketaccess'&lt;/SPAN&gt;&lt;SPAN&gt; &amp;nbsp;&lt;/SPAN&gt;&lt;SPAN&gt;# replace with your actual container name&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;input_file_path &lt;/SPAN&gt;&lt;SPAN&gt;=&lt;/SPAN&gt; &lt;SPAN&gt;f&lt;/SPAN&gt;&lt;SPAN&gt;"/dbfs/mnt/&lt;/SPAN&gt;&lt;SPAN&gt;{&lt;/SPAN&gt;&lt;SPAN&gt;container_name&lt;/SPAN&gt;&lt;SPAN&gt;}&lt;/SPAN&gt;&lt;SPAN&gt;/pn_2022_paticd_diag.txt"&lt;/SPAN&gt;&lt;/DIV&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;DIV&gt;&lt;SPAN&gt;split_file_by_size&lt;/SPAN&gt;&lt;SPAN&gt;(input_file_path)&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&amp;nbsp;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;It is not giving any error. The data is getting loaded Partially.&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&amp;nbsp;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;Regards&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;Rohit&amp;nbsp;&lt;/SPAN&gt;&lt;/DIV&gt;&lt;/DIV&gt;</description>
      <pubDate>Tue, 23 Jul 2024 07:10:46 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/partially-upload-data-of-1-2gb/m-p/80077#M35914</guid>
      <dc:creator>RohitKulkarni</dc:creator>
      <dc:date>2024-07-23T07:10:46Z</dc:date>
    </item>
    <item>
      <title>Re: Partially upload data of  1.2GB</title>
      <link>https://community.databricks.com/t5/data-engineering/partially-upload-data-of-1-2gb/m-p/80079#M35916</link>
      <description>&lt;P&gt;I'm a bit confused to be honest. This is neither PySpark nor really Databricks specific code, this is simply Python.&lt;/P&gt;&lt;P&gt;Do you want to migrate it PySpark and make real use of your Spark infrastructure?&lt;/P&gt;</description>
      <pubDate>Tue, 23 Jul 2024 07:18:30 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/partially-upload-data-of-1-2gb/m-p/80079#M35916</guid>
      <dc:creator>Witold</dc:creator>
      <dc:date>2024-07-23T07:18:30Z</dc:date>
    </item>
    <item>
      <title>Re: Partially upload data of  1.2GB</title>
      <link>https://community.databricks.com/t5/data-engineering/partially-upload-data-of-1-2gb/m-p/80080#M35917</link>
      <description>&lt;P&gt;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/107959"&gt;@Witold&lt;/a&gt;&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/7159"&gt;@RohitKulkarni&lt;/a&gt;&amp;nbsp;&amp;nbsp;I am also confused now , according to the use case , what this python code is doing here&amp;nbsp; , your requirement is simple anyways but could you please elaborate a little bit more .&lt;/P&gt;</description>
      <pubDate>Tue, 23 Jul 2024 07:23:23 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/partially-upload-data-of-1-2gb/m-p/80080#M35917</guid>
      <dc:creator>Rishabh-Pandey</dc:creator>
      <dc:date>2024-07-23T07:23:23Z</dc:date>
    </item>
    <item>
      <title>Re: Partially upload data of  1.2GB</title>
      <link>https://community.databricks.com/t5/data-engineering/partially-upload-data-of-1-2gb/m-p/80081#M35918</link>
      <description>&lt;P&gt;Honestly if you have read the description correctly. Will be able to come to know what i am trying to achieve .&lt;/P&gt;&lt;P&gt;I am having text file of 1.2GB of data.I am trying to load the data into Azure sql database.&lt;/P&gt;&lt;P&gt;There are total rows :&lt;SPAN&gt;51303483&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;I am able to read and load the rows :&lt;SPAN&gt;10224430&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;I am trying to use the Python script and got failed .&lt;/P&gt;&lt;P&gt;After that i am trying to load via pyspark still getting failed&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Thanks in adavce&lt;/P&gt;&lt;P&gt;Rohit&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Tue, 23 Jul 2024 07:26:25 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/partially-upload-data-of-1-2gb/m-p/80081#M35918</guid>
      <dc:creator>RohitKulkarni</dc:creator>
      <dc:date>2024-07-23T07:26:25Z</dc:date>
    </item>
    <item>
      <title>Re: Partially upload data of  1.2GB</title>
      <link>https://community.databricks.com/t5/data-engineering/partially-upload-data-of-1-2gb/m-p/80084#M35920</link>
      <description>&lt;P&gt;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/7159"&gt;@RohitKulkarni&lt;/a&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;To be able to help you, we would need to understand at least the following:&lt;/P&gt;&lt;UL&gt;&lt;LI&gt;How does your data look like, i.e what's inside this txt file? Is this structure or semi-structured data?&lt;/LI&gt;&lt;LI&gt;You only mention that it fails, without giving any specific details. If you say that only 10% of the initial data is visible in your SQL database, you should be able to easily tell, which data worked and which not. Can you share both examples with us?&lt;/LI&gt;&lt;/UL&gt;</description>
      <pubDate>Tue, 23 Jul 2024 07:36:40 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/partially-upload-data-of-1-2gb/m-p/80084#M35920</guid>
      <dc:creator>Witold</dc:creator>
      <dc:date>2024-07-23T07:36:40Z</dc:date>
    </item>
    <item>
      <title>Re: Partially upload data of  1.2GB</title>
      <link>https://community.databricks.com/t5/data-engineering/partially-upload-data-of-1-2gb/m-p/80090#M35922</link>
      <description>&lt;P&gt;It has structured data :&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;PAT_KEY|ICD_VERSION|ICD_CODE|ICD_PRI_SEC|ICD_POA|&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;|516351692|10|M12.123|A|Y &lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;Regards&lt;/P&gt;&lt;P&gt;Rohit&lt;/P&gt;</description>
      <pubDate>Tue, 23 Jul 2024 08:09:22 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/partially-upload-data-of-1-2gb/m-p/80090#M35922</guid>
      <dc:creator>RohitKulkarni</dc:creator>
      <dc:date>2024-07-23T08:09:22Z</dc:date>
    </item>
    <item>
      <title>Re: Partially upload data of  1.2GB</title>
      <link>https://community.databricks.com/t5/data-engineering/partially-upload-data-of-1-2gb/m-p/80110#M35935</link>
      <description>&lt;P&gt;Why don't you simply use spark to process it? Like:&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;LI-CODE lang="python"&gt;df = spark.read.option('sep','|').option('header', True).format('csv').load(file_path)&lt;/LI-CODE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;Since it appears that you also have a schema you can avoid inferring it and pass it explicitly:&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;LI-CODE lang="python"&gt;df = spark.read.option('sep','|').option('header', True).schema('PAT_KEY INT, CD_VERSION INT, CD_CODE STRING, CD_PRI_SEC STRING, CD_POA STRING').format('csv').load(file_path)&lt;/LI-CODE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Then you process it and write it wherever and in which format you prefer.&lt;/P&gt;&lt;P&gt;If some of your data is corrupted, i.e. not according to the schema, you might want to look into Auto Loader and its &lt;A href="https://learn.microsoft.com/en-us/azure/databricks/ingestion/auto-loader/schema#--what-is-the-rescued-data-column" target="_self"&gt;rescue data&lt;/A&gt; feature.&lt;/P&gt;</description>
      <pubDate>Tue, 23 Jul 2024 10:14:39 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/partially-upload-data-of-1-2gb/m-p/80110#M35935</guid>
      <dc:creator>Witold</dc:creator>
      <dc:date>2024-07-23T10:14:39Z</dc:date>
    </item>
    <item>
      <title>Re: Partially upload data of  1.2GB</title>
      <link>https://community.databricks.com/t5/data-engineering/partially-upload-data-of-1-2gb/m-p/80125#M35940</link>
      <description>&lt;P&gt;I already tried with same syntax read.option. But still i am facing the same issue&lt;/P&gt;</description>
      <pubDate>Tue, 23 Jul 2024 11:42:33 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/partially-upload-data-of-1-2gb/m-p/80125#M35940</guid>
      <dc:creator>RohitKulkarni</dc:creator>
      <dc:date>2024-07-23T11:42:33Z</dc:date>
    </item>
    <item>
      <title>Re: Partially upload data of  1.2GB</title>
      <link>https://community.databricks.com/t5/data-engineering/partially-upload-data-of-1-2gb/m-p/80255#M35971</link>
      <description>&lt;P&gt;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/7159"&gt;@RohitKulkarni&lt;/a&gt;&amp;nbsp;please&amp;nbsp; explain your case propery then , we hve given you a sample answer not acc to your scenario.&lt;/P&gt;</description>
      <pubDate>Wed, 24 Jul 2024 05:55:16 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/partially-upload-data-of-1-2gb/m-p/80255#M35971</guid>
      <dc:creator>Rishabh-Pandey</dc:creator>
      <dc:date>2024-07-24T05:55:16Z</dc:date>
    </item>
    <item>
      <title>Re: Partially upload data of  1.2GB</title>
      <link>https://community.databricks.com/t5/data-engineering/partially-upload-data-of-1-2gb/m-p/85708#M37248</link>
      <description>&lt;P&gt;There was a access lines in the document. Because of this there was partially loading.&lt;/P&gt;&lt;P&gt;Thanks for the support&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Wed, 28 Aug 2024 08:53:15 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/partially-upload-data-of-1-2gb/m-p/85708#M37248</guid>
      <dc:creator>RohitKulkarni</dc:creator>
      <dc:date>2024-08-28T08:53:15Z</dc:date>
    </item>
  </channel>
</rss>

