<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: How to download the results in batches in Get Started Discussions</title>
    <link>https://community.databricks.com/t5/get-started-discussions/how-to-download-the-results-in-batches/m-p/107536#M4772</link>
    <description>&lt;P&gt;Hi&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/146566"&gt;@Lumoura&lt;/a&gt;&amp;nbsp;,&lt;/P&gt;&lt;P&gt;So maybe just save the result as a csv to Volume and then use pyspark to split this file into smaller parts, i.e using repartition:&lt;/P&gt;&lt;LI-CODE lang="python"&gt;df = df.repartition(num_partitions)&lt;/LI-CODE&gt;&lt;P&gt;&lt;BR /&gt;&amp;nbsp;&lt;/P&gt;</description>
    <pubDate>Wed, 29 Jan 2025 10:07:59 GMT</pubDate>
    <dc:creator>szymon_dybczak</dc:creator>
    <dc:date>2025-01-29T10:07:59Z</dc:date>
    <item>
      <title>How to download the results in batches</title>
      <link>https://community.databricks.com/t5/get-started-discussions/how-to-download-the-results-in-batches/m-p/107492#M4770</link>
      <description>&lt;P&gt;&lt;SPAN&gt;Hello, how are you?&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;I`m trying to download some of my results on databricks and the sheets is around 300mb, unfortunately my google sheets is not open files that has more then 100mb.&amp;nbsp; Is that any chance that i could download the results in batches to join after manually? (The results is more then 100k lines)&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;/P&gt;</description>
      <pubDate>Tue, 28 Jan 2025 21:43:51 GMT</pubDate>
      <guid>https://community.databricks.com/t5/get-started-discussions/how-to-download-the-results-in-batches/m-p/107492#M4770</guid>
      <dc:creator>Lumoura</dc:creator>
      <dc:date>2025-01-28T21:43:51Z</dc:date>
    </item>
    <item>
      <title>Re: How to download the results in batches</title>
      <link>https://community.databricks.com/t5/get-started-discussions/how-to-download-the-results-in-batches/m-p/107536#M4772</link>
      <description>&lt;P&gt;Hi&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/146566"&gt;@Lumoura&lt;/a&gt;&amp;nbsp;,&lt;/P&gt;&lt;P&gt;So maybe just save the result as a csv to Volume and then use pyspark to split this file into smaller parts, i.e using repartition:&lt;/P&gt;&lt;LI-CODE lang="python"&gt;df = df.repartition(num_partitions)&lt;/LI-CODE&gt;&lt;P&gt;&lt;BR /&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Wed, 29 Jan 2025 10:07:59 GMT</pubDate>
      <guid>https://community.databricks.com/t5/get-started-discussions/how-to-download-the-results-in-batches/m-p/107536#M4772</guid>
      <dc:creator>szymon_dybczak</dc:creator>
      <dc:date>2025-01-29T10:07:59Z</dc:date>
    </item>
    <item>
      <title>Re: How to download the results in batches</title>
      <link>https://community.databricks.com/t5/get-started-discussions/how-to-download-the-results-in-batches/m-p/108259#M4784</link>
      <description>&lt;P&gt;Hey,&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Thinking of more alternates to repartition:&lt;/P&gt;
&lt;P&gt;1- Use the &lt;CODE&gt;limit&lt;/CODE&gt; and &lt;CODE&gt;offset&lt;/CODE&gt; options in your SQL queries to export data in manageable chunks. For example, if you have a table with 100,000 rows and you want to export 10,000 rows at a time, you can use the following queries:&lt;/P&gt;
&lt;LI-CODE lang="markup"&gt;SELECT * FROM your_table LIMIT 10000 OFFSET 0;
SELECT * FROM your_table LIMIT 10000 OFFSET 10000;
SELECT * FROM your_table LIMIT 10000 OFFSET 20000;
...&lt;/LI-CODE&gt;
&lt;P&gt;&amp;nbsp;(Adjust the &lt;CODE&gt;LIMIT&lt;/CODE&gt; and &lt;CODE&gt;OFFSET&lt;/CODE&gt; values based on the size of your data and the number of rows you want to export in each batch)&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;2-&amp;nbsp;&amp;nbsp;Save the results of each query to a separate CSV file. You can use the &lt;CODE&gt;dbutils.fs&lt;/CODE&gt; module to save the results to DBFS (Databricks File System) and then download them to your local machine.&lt;/P&gt;
&lt;P&gt;3 - My personal fav is splitting, when working with heap dumps, you could use that too once you have the large file, split them up.&lt;/P&gt;
&lt;LI-CODE lang="markup"&gt;split -b 100M large.tar.gz large.tar.gz.part&lt;/LI-CODE&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Sat, 01 Feb 2025 06:24:23 GMT</pubDate>
      <guid>https://community.databricks.com/t5/get-started-discussions/how-to-download-the-results-in-batches/m-p/108259#M4784</guid>
      <dc:creator>NandiniN</dc:creator>
      <dc:date>2025-02-01T06:24:23Z</dc:date>
    </item>
  </channel>
</rss>

