<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Work-around for cloning a repo with notebooks too big in Administration &amp; Architecture</title>
    <link>https://community.databricks.com/t5/administration-architecture/work-around-for-cloning-a-repo-with-notebooks-too-big/m-p/157794#M5283</link>
    <description>&lt;P&gt;Hi&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/137009"&gt;@dkxxx-rc&lt;/a&gt;,&lt;/P&gt;
&lt;P data-pm-slice="1 1 []"&gt;One option besides sparse checkout is to use a Git folder with Git CLI access. The public docs explain that standard Git folders are constrained by per-operation limits, and for larger repos, Databricks recommends either &lt;A href="https://docs.databricks.com/aws/en/repos/git-operations-with-repos#sparse" rel="noopener noreferrer nofollow" target="_blank"&gt;sparse checkout&lt;/A&gt; or &lt;A href="https://docs.databricks.com/aws/en/repos/git-operations-with-repos#use-git-cli" rel="noopener noreferrer nofollow" target="_blank"&gt;Git CLI commands&lt;/A&gt;.&lt;/P&gt;
&lt;P&gt;More specifically, the docs say Git CLI-enabled folders can work with repositories that exceed the 2 GB memory and 4 GB disk limits of standard Git folders, so that’s probably the next thing I’d suggest trying if sparse checkout is only a partial workaround. You can see that called out in &lt;A href="https://docs.databricks.com/aws/en/repos/git-operations-with-repos" rel="noopener noreferrer nofollow" target="_blank"&gt;Create and manage Git folders&lt;/A&gt; and in the &lt;A href="https://docs.databricks.com/aws/en/repos/limits" rel="noopener noreferrer nofollow" target="_blank"&gt;Git folder limits&lt;/A&gt; page.&lt;/P&gt;
&lt;P&gt;One caveat.... you can’t turn on Git CLI support for an existing Git folder, so this would need to be a fresh clone created with Git CLI access enabled.&lt;/P&gt;
&lt;P&gt;If the underlying issue is specifically a few oversized notebooks or other large committed files, the docs also note that adding them to .gitignore won’t shrink the repo once they’re already in history. In that case, the more durable fix is to remove those files from history with something like git filter-repo, or split out the problematic content.&lt;/P&gt;
&lt;P class="p1"&gt;&lt;FONT size="2" color="#FF6600"&gt;&lt;STRONG&gt;&lt;I&gt;If this answer resolves your question, could you mark it as “Accept as Solution”? That helps other users quickly find the correct fix.&lt;/I&gt;&lt;/STRONG&gt;&lt;/FONT&gt;&lt;I&gt;&lt;/I&gt;&lt;/P&gt;</description>
    <pubDate>Thu, 28 May 2026 13:14:21 GMT</pubDate>
    <dc:creator>Ashwin_DSA</dc:creator>
    <dc:date>2026-05-28T13:14:21Z</dc:date>
    <item>
      <title>Work-around for cloning a repo with notebooks too big</title>
      <link>https://community.databricks.com/t5/administration-architecture/work-around-for-cloning-a-repo-with-notebooks-too-big/m-p/157786#M5282</link>
      <description>&lt;P&gt;Hi,&lt;/P&gt;&lt;P&gt;It would be nice to be able to clone&amp;nbsp;&lt;A href="https://github.com/shap/shap" target="_blank"&gt;https://github.com/shap/shap&lt;/A&gt;&amp;nbsp;into Databricks, it being such a standard.&amp;nbsp; But it fails because some of the notebooks violate a Databricks size limit.&amp;nbsp; Using sparse clone mode I can force my way through getting most of the repo, but is there another solution?&lt;/P&gt;&lt;P&gt;Thanks.&lt;/P&gt;</description>
      <pubDate>Thu, 28 May 2026 12:10:43 GMT</pubDate>
      <guid>https://community.databricks.com/t5/administration-architecture/work-around-for-cloning-a-repo-with-notebooks-too-big/m-p/157786#M5282</guid>
      <dc:creator>dkxxx-rc</dc:creator>
      <dc:date>2026-05-28T12:10:43Z</dc:date>
    </item>
    <item>
      <title>Re: Work-around for cloning a repo with notebooks too big</title>
      <link>https://community.databricks.com/t5/administration-architecture/work-around-for-cloning-a-repo-with-notebooks-too-big/m-p/157794#M5283</link>
      <description>&lt;P&gt;Hi&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/137009"&gt;@dkxxx-rc&lt;/a&gt;,&lt;/P&gt;
&lt;P data-pm-slice="1 1 []"&gt;One option besides sparse checkout is to use a Git folder with Git CLI access. The public docs explain that standard Git folders are constrained by per-operation limits, and for larger repos, Databricks recommends either &lt;A href="https://docs.databricks.com/aws/en/repos/git-operations-with-repos#sparse" rel="noopener noreferrer nofollow" target="_blank"&gt;sparse checkout&lt;/A&gt; or &lt;A href="https://docs.databricks.com/aws/en/repos/git-operations-with-repos#use-git-cli" rel="noopener noreferrer nofollow" target="_blank"&gt;Git CLI commands&lt;/A&gt;.&lt;/P&gt;
&lt;P&gt;More specifically, the docs say Git CLI-enabled folders can work with repositories that exceed the 2 GB memory and 4 GB disk limits of standard Git folders, so that’s probably the next thing I’d suggest trying if sparse checkout is only a partial workaround. You can see that called out in &lt;A href="https://docs.databricks.com/aws/en/repos/git-operations-with-repos" rel="noopener noreferrer nofollow" target="_blank"&gt;Create and manage Git folders&lt;/A&gt; and in the &lt;A href="https://docs.databricks.com/aws/en/repos/limits" rel="noopener noreferrer nofollow" target="_blank"&gt;Git folder limits&lt;/A&gt; page.&lt;/P&gt;
&lt;P&gt;One caveat.... you can’t turn on Git CLI support for an existing Git folder, so this would need to be a fresh clone created with Git CLI access enabled.&lt;/P&gt;
&lt;P&gt;If the underlying issue is specifically a few oversized notebooks or other large committed files, the docs also note that adding them to .gitignore won’t shrink the repo once they’re already in history. In that case, the more durable fix is to remove those files from history with something like git filter-repo, or split out the problematic content.&lt;/P&gt;
&lt;P class="p1"&gt;&lt;FONT size="2" color="#FF6600"&gt;&lt;STRONG&gt;&lt;I&gt;If this answer resolves your question, could you mark it as “Accept as Solution”? That helps other users quickly find the correct fix.&lt;/I&gt;&lt;/STRONG&gt;&lt;/FONT&gt;&lt;I&gt;&lt;/I&gt;&lt;/P&gt;</description>
      <pubDate>Thu, 28 May 2026 13:14:21 GMT</pubDate>
      <guid>https://community.databricks.com/t5/administration-architecture/work-around-for-cloning-a-repo-with-notebooks-too-big/m-p/157794#M5283</guid>
      <dc:creator>Ashwin_DSA</dc:creator>
      <dc:date>2026-05-28T13:14:21Z</dc:date>
    </item>
    <item>
      <title>Re: Work-around for cloning a repo with notebooks too big</title>
      <link>https://community.databricks.com/t5/administration-architecture/work-around-for-cloning-a-repo-with-notebooks-too-big/m-p/157819#M5291</link>
      <description>&lt;P&gt;This looks sufficient, thanks.&amp;nbsp; I was able to run a clone in the CLI.&amp;nbsp; The folder ends up in an invalid git state, but that appears to be because the beta feature isn't turned on for my workspace yet.&lt;/P&gt;</description>
      <pubDate>Thu, 28 May 2026 17:02:14 GMT</pubDate>
      <guid>https://community.databricks.com/t5/administration-architecture/work-around-for-cloning-a-repo-with-notebooks-too-big/m-p/157819#M5291</guid>
      <dc:creator>dkxxx-rc</dc:creator>
      <dc:date>2026-05-28T17:02:14Z</dc:date>
    </item>
  </channel>
</rss>

