<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic GitLab Integration in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/gitlab-integration/m-p/113402#M44524</link>
    <description>&lt;P&gt;Hello&amp;nbsp;&lt;span class="lia-unicode-emoji" title=":waving_hand:"&gt;👋&lt;/span&gt;&lt;/P&gt;&lt;P&gt;I'm struggling with Gitlab integration in databricks.&lt;BR /&gt;I've got jobs that run on a daily basis, pointing directly to .py files in my repo. In order to do so, my gitlab account is linked to databricks with a PAT expiring within a month.&lt;BR /&gt;&lt;BR /&gt;But every other day (at least once a week), I get the same error when scheduled jobs run:&lt;BR /&gt;&lt;SPAN&gt;Failed to checkout Git repository: UNAUTHENTICATED: Invalid Git provider Personal Access Token credentials for repository URL.&lt;/SPAN&gt;&lt;BR /&gt;&lt;BR /&gt;This happens even though PAT is not expired.&lt;BR /&gt;Usually I simply renew my PAT and update my linked account in databricks settings.&amp;nbsp;&lt;BR /&gt;&lt;BR /&gt;Is there a better way to link my repo to our databricks instance?&amp;nbsp;&lt;BR /&gt;Ideally it wouldn't be through a PAT as it depends on user, and I'd like it to be more stable.&lt;BR /&gt;&lt;BR /&gt;Thanks in advance for your help,&amp;nbsp;&lt;BR /&gt;Florian&lt;/P&gt;</description>
    <pubDate>Mon, 24 Mar 2025 09:56:21 GMT</pubDate>
    <dc:creator>flodoamaral</dc:creator>
    <dc:date>2025-03-24T09:56:21Z</dc:date>
    <item>
      <title>GitLab Integration</title>
      <link>https://community.databricks.com/t5/data-engineering/gitlab-integration/m-p/113402#M44524</link>
      <description>&lt;P&gt;Hello&amp;nbsp;&lt;span class="lia-unicode-emoji" title=":waving_hand:"&gt;👋&lt;/span&gt;&lt;/P&gt;&lt;P&gt;I'm struggling with Gitlab integration in databricks.&lt;BR /&gt;I've got jobs that run on a daily basis, pointing directly to .py files in my repo. In order to do so, my gitlab account is linked to databricks with a PAT expiring within a month.&lt;BR /&gt;&lt;BR /&gt;But every other day (at least once a week), I get the same error when scheduled jobs run:&lt;BR /&gt;&lt;SPAN&gt;Failed to checkout Git repository: UNAUTHENTICATED: Invalid Git provider Personal Access Token credentials for repository URL.&lt;/SPAN&gt;&lt;BR /&gt;&lt;BR /&gt;This happens even though PAT is not expired.&lt;BR /&gt;Usually I simply renew my PAT and update my linked account in databricks settings.&amp;nbsp;&lt;BR /&gt;&lt;BR /&gt;Is there a better way to link my repo to our databricks instance?&amp;nbsp;&lt;BR /&gt;Ideally it wouldn't be through a PAT as it depends on user, and I'd like it to be more stable.&lt;BR /&gt;&lt;BR /&gt;Thanks in advance for your help,&amp;nbsp;&lt;BR /&gt;Florian&lt;/P&gt;</description>
      <pubDate>Mon, 24 Mar 2025 09:56:21 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/gitlab-integration/m-p/113402#M44524</guid>
      <dc:creator>flodoamaral</dc:creator>
      <dc:date>2025-03-24T09:56:21Z</dc:date>
    </item>
    <item>
      <title>Re: GitLab Integration</title>
      <link>https://community.databricks.com/t5/data-engineering/gitlab-integration/m-p/136637#M50622</link>
      <description>&lt;P class="my-2 [&amp;amp;+p]:mt-4 [&amp;amp;_strong:has(+br)]:inline-block [&amp;amp;_strong:has(+br)]:pb-2"&gt;The error you are experiencing—"UNAUTHENTICATED: Invalid Git provider Personal Access Token credentials for repository URL"—is a common pain point when integrating GitLab repos with Databricks using Personal Access Tokens (PATs), especially for scheduled jobs and automation. While using a PAT can work, it's not as stable as you need because tokens are tied to users, expire, and occasionally become invalid for reasons beyond just manual expiry (e.g., changes in permissions, security policies, or intermittent authentication issues).​&lt;/P&gt;
&lt;H2 class="mb-2 mt-4 font-display font-semimedium text-base first:mt-0"&gt;Alternatives to PATs for Databricks-GitLab Integration&lt;/H2&gt;
&lt;P class="my-2 [&amp;amp;+p]:mt-4 [&amp;amp;_strong:has(+br)]:inline-block [&amp;amp;_strong:has(+br)]:pb-2"&gt;Unfortunately, Databricks’ more robust integration options (such as OAuth-based authentication and service principals) are currently available primarily for GitHub and Azure DevOps, not GitLab. OAuth 2.0 and service principal setups help decouple repo access from individual user accounts and avoid problems with expiring PATs, but as of now, GitLab support for these mechanisms in Databricks is limited or not natively available.​&lt;/P&gt;
&lt;H2 class="mb-2 mt-4 font-display font-semimedium text-base first:mt-0"&gt;Best Practices &amp;amp; Workarounds&lt;/H2&gt;
&lt;UL class="marker:text-quiet list-disc"&gt;
&lt;LI class="py-0 my-0 prose-p:pt-0 prose-p:mb-2 prose-p:my-0 [&amp;amp;&amp;gt;p]:pt-0 [&amp;amp;&amp;gt;p]:mb-2 [&amp;amp;&amp;gt;p]:my-0"&gt;
&lt;P class="my-2 [&amp;amp;+p]:mt-4 [&amp;amp;_strong:has(+br)]:inline-block [&amp;amp;_strong:has(+br)]:pb-2"&gt;&lt;STRONG&gt;PAT Management:&lt;/STRONG&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;Databricks currently only supports user-level PATs for GitLab (not project or group tokens), so you must manage these at the user level.​&lt;/P&gt;
&lt;/LI&gt;
&lt;LI class="py-0 my-0 prose-p:pt-0 prose-p:mb-2 prose-p:my-0 [&amp;amp;&amp;gt;p]:pt-0 [&amp;amp;&amp;gt;p]:mb-2 [&amp;amp;&amp;gt;p]:my-0"&gt;
&lt;P class="my-2 [&amp;amp;+p]:mt-4 [&amp;amp;_strong:has(+br)]:inline-block [&amp;amp;_strong:has(+br)]:pb-2"&gt;&lt;STRONG&gt;CI/CD Approach:&lt;/STRONG&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;Many teams set up an automated GitLab runner or pipeline to sync code between GitLab and Databricks, using the Databricks CLI authenticated with environment variables like&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;CODE&gt;DATABRICKS_HOST&lt;/CODE&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;and&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;CODE&gt;DATABRICKS_TOKEN&lt;/CODE&gt;. This keeps code sync outside Databricks’ scheduled jobs and is more stable if your pipeline runner manages the tokens securely.​&lt;/P&gt;
&lt;/LI&gt;
&lt;LI class="py-0 my-0 prose-p:pt-0 prose-p:mb-2 prose-p:my-0 [&amp;amp;&amp;gt;p]:pt-0 [&amp;amp;&amp;gt;p]:mb-2 [&amp;amp;&amp;gt;p]:my-0"&gt;
&lt;P class="my-2 [&amp;amp;+p]:mt-4 [&amp;amp;_strong:has(+br)]:inline-block [&amp;amp;_strong:has(+br)]:pb-2"&gt;&lt;STRONG&gt;Databricks CLI Unified Auth:&lt;/STRONG&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;For pipelined code sync, use environment variable authentication (setting&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;CODE&gt;DATABRICKS_HOST&lt;/CODE&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;and&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;CODE&gt;DATABRICKS_TOKEN&lt;/CODE&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;in your CI/CD environment) instead of storing PATs in Databricks “Linked Accounts.” This is more reliable for automation.​&lt;/P&gt;
&lt;/LI&gt;
&lt;LI class="py-0 my-0 prose-p:pt-0 prose-p:mb-2 prose-p:my-0 [&amp;amp;&amp;gt;p]:pt-0 [&amp;amp;&amp;gt;p]:mb-2 [&amp;amp;&amp;gt;p]:my-0"&gt;
&lt;P class="my-2 [&amp;amp;+p]:mt-4 [&amp;amp;_strong:has(+br)]:inline-block [&amp;amp;_strong:has(+br)]:pb-2"&gt;&lt;STRONG&gt;Re-clone on Failure:&lt;/STRONG&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;As a short-term fix, if you encounter repo authentication errors, deleting and re-cloning the repo in Databricks can reset its internal connection state.​&lt;/P&gt;
&lt;/LI&gt;
&lt;/UL&gt;
&lt;H2 class="mb-2 mt-4 font-display font-semimedium text-base first:mt-0"&gt;Recommended Approach&lt;/H2&gt;
&lt;P class="my-2 [&amp;amp;+p]:mt-4 [&amp;amp;_strong:has(+br)]:inline-block [&amp;amp;_strong:has(+br)]:pb-2"&gt;Given the current limitations:&lt;/P&gt;
&lt;OL class="marker:text-quiet list-decimal"&gt;
&lt;LI class="py-0 my-0 prose-p:pt-0 prose-p:mb-2 prose-p:my-0 [&amp;amp;&amp;gt;p]:pt-0 [&amp;amp;&amp;gt;p]:mb-2 [&amp;amp;&amp;gt;p]:my-0"&gt;
&lt;P class="my-2 [&amp;amp;+p]:mt-4 [&amp;amp;_strong:has(+br)]:inline-block [&amp;amp;_strong:has(+br)]:pb-2"&gt;&lt;STRONG&gt;Automate via GitLab CI/CD:&lt;/STRONG&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;Run a job using the Databricks CLI or REST API to sync your code from GitLab to Databricks, using non-interactive tokens stored in your CI/CD environment as environment variables.&lt;/P&gt;
&lt;/LI&gt;
&lt;LI class="py-0 my-0 prose-p:pt-0 prose-p:mb-2 prose-p:my-0 [&amp;amp;&amp;gt;p]:pt-0 [&amp;amp;&amp;gt;p]:mb-2 [&amp;amp;&amp;gt;p]:my-0"&gt;
&lt;P class="my-2 [&amp;amp;+p]:mt-4 [&amp;amp;_strong:has(+br)]:inline-block [&amp;amp;_strong:has(+br)]:pb-2"&gt;&lt;STRONG&gt;Centralized Tokens:&lt;/STRONG&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;Create a dedicated “service account” user in GitLab and Databricks, and use that account’s PAT for automation to reduce the chance of disruptions from individual user actions.&lt;/P&gt;
&lt;/LI&gt;
&lt;LI class="py-0 my-0 prose-p:pt-0 prose-p:mb-2 prose-p:my-0 [&amp;amp;&amp;gt;p]:pt-0 [&amp;amp;&amp;gt;p]:mb-2 [&amp;amp;&amp;gt;p]:my-0"&gt;
&lt;P class="my-2 [&amp;amp;+p]:mt-4 [&amp;amp;_strong:has(+br)]:inline-block [&amp;amp;_strong:has(+br)]:pb-2"&gt;&lt;STRONG&gt;Monitor Upstream Updates:&lt;/STRONG&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;Watch for future Databricks announcements regarding OAuth or service principal support for GitLab—this would enable a fully robust, PAT-free integration.&lt;/P&gt;
&lt;/LI&gt;
&lt;/OL&gt;
&lt;P class="my-2 [&amp;amp;+p]:mt-4 [&amp;amp;_strong:has(+br)]:inline-block [&amp;amp;_strong:has(+br)]:pb-2"&gt;At this time, there is no built-in way to link a GitLab repo to Databricks for scheduled jobs&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;STRONG&gt;without&lt;/STRONG&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;using a PAT or equivalent per-user credential. Using CI/CD for code sync and minimizing direct repo linking in Databricks jobs is the best practice for long-term stability.​&lt;/P&gt;</description>
      <pubDate>Wed, 29 Oct 2025 20:40:01 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/gitlab-integration/m-p/136637#M50622</guid>
      <dc:creator>mark_ott</dc:creator>
      <dc:date>2025-10-29T20:40:01Z</dc:date>
    </item>
  </channel>
</rss>

