<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Write to csv file in S3 bucket in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/write-to-csv-file-in-s3-bucket/m-p/66030#M32990</link>
    <description>&lt;P&gt;I have a pandas dataframe in my Pyspark notebook. I want to save this dataframe to my S3 bucket. I'm using the following command to save it&lt;/P&gt;&lt;P&gt;import boto3&lt;/P&gt;&lt;DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;import&lt;/SPAN&gt;&lt;SPAN&gt; s3fs&lt;/SPAN&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;df_summary.&lt;/SPAN&gt;&lt;SPAN&gt;to_csv&lt;/SPAN&gt;&lt;SPAN&gt;(&lt;/SPAN&gt;&lt;SPAN&gt;f&lt;/SPAN&gt;&lt;SPAN&gt;"s3://dataconversion/data/exclude"&lt;/SPAN&gt;&lt;SPAN&gt;,&lt;/SPAN&gt;&lt;SPAN&gt;index&lt;/SPAN&gt;&lt;SPAN&gt;=&lt;/SPAN&gt;&lt;SPAN&gt;False&lt;/SPAN&gt;&lt;SPAN&gt;)&lt;/SPAN&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;P&gt;but I keep getting this error:&amp;nbsp;&lt;SPAN class=""&gt;ModuleNotFoundError: &lt;/SPAN&gt;&lt;SPAN&gt;No module named 'botocore.compress'&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;I already tried to upgrade boto3 but same error. This problem seems to be with panda libraries only. I'm able to read from CSV with spark.read.format('csv') without issues&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;Any suggestions?&lt;/SPAN&gt;&lt;/P&gt;</description>
    <pubDate>Wed, 10 Apr 2024 17:46:45 GMT</pubDate>
    <dc:creator>mh_db</dc:creator>
    <dc:date>2024-04-10T17:46:45Z</dc:date>
    <item>
      <title>Write to csv file in S3 bucket</title>
      <link>https://community.databricks.com/t5/data-engineering/write-to-csv-file-in-s3-bucket/m-p/66030#M32990</link>
      <description>&lt;P&gt;I have a pandas dataframe in my Pyspark notebook. I want to save this dataframe to my S3 bucket. I'm using the following command to save it&lt;/P&gt;&lt;P&gt;import boto3&lt;/P&gt;&lt;DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;import&lt;/SPAN&gt;&lt;SPAN&gt; s3fs&lt;/SPAN&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;df_summary.&lt;/SPAN&gt;&lt;SPAN&gt;to_csv&lt;/SPAN&gt;&lt;SPAN&gt;(&lt;/SPAN&gt;&lt;SPAN&gt;f&lt;/SPAN&gt;&lt;SPAN&gt;"s3://dataconversion/data/exclude"&lt;/SPAN&gt;&lt;SPAN&gt;,&lt;/SPAN&gt;&lt;SPAN&gt;index&lt;/SPAN&gt;&lt;SPAN&gt;=&lt;/SPAN&gt;&lt;SPAN&gt;False&lt;/SPAN&gt;&lt;SPAN&gt;)&lt;/SPAN&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;P&gt;but I keep getting this error:&amp;nbsp;&lt;SPAN class=""&gt;ModuleNotFoundError: &lt;/SPAN&gt;&lt;SPAN&gt;No module named 'botocore.compress'&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;I already tried to upgrade boto3 but same error. This problem seems to be with panda libraries only. I'm able to read from CSV with spark.read.format('csv') without issues&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;Any suggestions?&lt;/SPAN&gt;&lt;/P&gt;</description>
      <pubDate>Wed, 10 Apr 2024 17:46:45 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/write-to-csv-file-in-s3-bucket/m-p/66030#M32990</guid>
      <dc:creator>mh_db</dc:creator>
      <dc:date>2024-04-10T17:46:45Z</dc:date>
    </item>
    <item>
      <title>Re: Write to csv file in S3 bucket</title>
      <link>https://community.databricks.com/t5/data-engineering/write-to-csv-file-in-s3-bucket/m-p/66031#M32991</link>
      <description>&lt;P&gt;Hi&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/103459"&gt;@mh_db&lt;/a&gt;&amp;nbsp;- you can import botocore library (or) if it is not found can do a pip install botocore to resolve this. Alternatively, you can maintain the data in a spark dataframe without converting to a pandas dataframe and while writing to a csv. you can use coalesce(1) to write to a single csv file (depending on your requirements).&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Wed, 10 Apr 2024 18:56:58 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/write-to-csv-file-in-s3-bucket/m-p/66031#M32991</guid>
      <dc:creator>shan_chandra</dc:creator>
      <dc:date>2024-04-10T18:56:58Z</dc:date>
    </item>
  </channel>
</rss>

