<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: How to get git commit ID of the repository the script runs on? in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/how-to-get-git-commit-id-of-the-repository-the-script-runs-on/m-p/4509#M1212</link>
    <description>&lt;P&gt;I think this is because of the fact that the code and execution (clusters) are separated.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;</description>
    <pubDate>Fri, 12 May 2023 08:18:42 GMT</pubDate>
    <dc:creator>-werners-</dc:creator>
    <dc:date>2023-05-12T08:18:42Z</dc:date>
    <item>
      <title>How to get git commit ID of the repository the script runs on?</title>
      <link>https://community.databricks.com/t5/data-engineering/how-to-get-git-commit-id-of-the-repository-the-script-runs-on/m-p/4508#M1211</link>
      <description>&lt;P&gt;I have a script in a repository on DataBricks. The script should log the current git commit ID of the repository. How can that be implemented? &lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;I tried various command, for example: &lt;/P&gt;&lt;P&gt;&lt;I&gt;result = subprocess.run(['git', 'rev-parse', 'HEAD'], stdout=subprocess.PIPE, check=True)&lt;/I&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;but getting the following error:&lt;/P&gt;&lt;P&gt;&lt;I&gt;Command '['git', 'rev-parse', 'HEAD']' returned non-zero exit status 128.&lt;/I&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Any help with this will be appreciated&amp;nbsp;&lt;span class="lia-unicode-emoji" title=":folded_hands:"&gt;🙏&lt;/span&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;</description>
      <pubDate>Thu, 11 May 2023 19:30:02 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/how-to-get-git-commit-id-of-the-repository-the-script-runs-on/m-p/4508#M1211</guid>
      <dc:creator>annagriv</dc:creator>
      <dc:date>2023-05-11T19:30:02Z</dc:date>
    </item>
    <item>
      <title>Re: How to get git commit ID of the repository the script runs on?</title>
      <link>https://community.databricks.com/t5/data-engineering/how-to-get-git-commit-id-of-the-repository-the-script-runs-on/m-p/4509#M1212</link>
      <description>&lt;P&gt;I think this is because of the fact that the code and execution (clusters) are separated.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;</description>
      <pubDate>Fri, 12 May 2023 08:18:42 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/how-to-get-git-commit-id-of-the-repository-the-script-runs-on/m-p/4509#M1212</guid>
      <dc:creator>-werners-</dc:creator>
      <dc:date>2023-05-12T08:18:42Z</dc:date>
    </item>
    <item>
      <title>Re: How to get git commit ID of the repository the script runs on?</title>
      <link>https://community.databricks.com/t5/data-engineering/how-to-get-git-commit-id-of-the-repository-the-script-runs-on/m-p/4510#M1213</link>
      <description>&lt;P&gt;Thanks for you reply, so this means its impossible to log the git commit id? There is no way to pass this information to the cluster?&lt;/P&gt;</description>
      <pubDate>Sun, 14 May 2023 15:20:47 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/how-to-get-git-commit-id-of-the-repository-the-script-runs-on/m-p/4510#M1213</guid>
      <dc:creator>annagriv</dc:creator>
      <dc:date>2023-05-14T15:20:47Z</dc:date>
    </item>
    <item>
      <title>Re: How to get git commit ID of the repository the script runs on?</title>
      <link>https://community.databricks.com/t5/data-engineering/how-to-get-git-commit-id-of-the-repository-the-script-runs-on/m-p/4511#M1214</link>
      <description>&lt;P&gt;I think that is indeed the case (not 100% sure as I do not know the innards of Databricks).&lt;/P&gt;&lt;P&gt;The on-demand cluster receives a program to be executed and I don't think a git history is passed.&lt;/P&gt;</description>
      <pubDate>Mon, 15 May 2023 07:13:05 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/how-to-get-git-commit-id-of-the-repository-the-script-runs-on/m-p/4511#M1214</guid>
      <dc:creator>-werners-</dc:creator>
      <dc:date>2023-05-15T07:13:05Z</dc:date>
    </item>
    <item>
      <title>Re: How to get git commit ID of the repository the script runs on?</title>
      <link>https://community.databricks.com/t5/data-engineering/how-to-get-git-commit-id-of-the-repository-the-script-runs-on/m-p/110762#M43678</link>
      <description>&lt;P&gt;As of February 2025 it is possible using the repos rest api (I use the sdk for simplicity) and dbutils. This code snippet assumes that the notebook it's being run in is in the root of the databricks repos folder that we're interested in.&lt;/P&gt;&lt;DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;from&lt;/SPAN&gt;&lt;SPAN&gt; databricks.sdk &lt;/SPAN&gt;&lt;SPAN&gt;import&lt;/SPAN&gt;&lt;SPAN&gt; WorkspaceClient&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;w &lt;/SPAN&gt;&lt;SPAN&gt;=&lt;/SPAN&gt; &lt;SPAN&gt;WorkspaceClient&lt;/SPAN&gt;&lt;SPAN&gt;()&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&amp;nbsp;&lt;/DIV&gt;&lt;DIV&gt;&lt;DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;notebook_path &lt;/SPAN&gt;&lt;SPAN&gt;=&lt;/SPAN&gt;&lt;SPAN&gt; dbutils.notebook.entry_point.&lt;/SPAN&gt;&lt;SPAN&gt;getDbutils&lt;/SPAN&gt;&lt;SPAN&gt;().&lt;/SPAN&gt;&lt;SPAN&gt;notebook&lt;/SPAN&gt;&lt;SPAN&gt;().&lt;/SPAN&gt;&lt;SPAN&gt;getContext&lt;/SPAN&gt;&lt;SPAN&gt;().&lt;/SPAN&gt;&lt;SPAN&gt;notebookPath&lt;/SPAN&gt;&lt;SPAN&gt;().&lt;/SPAN&gt;&lt;SPAN&gt;get&lt;/SPAN&gt;&lt;SPAN&gt;()&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;prefix_path &lt;/SPAN&gt;&lt;SPAN&gt;=&lt;/SPAN&gt; &lt;SPAN&gt;'/'&lt;/SPAN&gt;&lt;SPAN&gt;.&lt;/SPAN&gt;&lt;SPAN&gt;join&lt;/SPAN&gt;&lt;SPAN&gt;(notebook_path.&lt;/SPAN&gt;&lt;SPAN&gt;split&lt;/SPAN&gt;&lt;SPAN&gt;(&lt;/SPAN&gt;&lt;SPAN&gt;'/'&lt;/SPAN&gt;&lt;SPAN&gt;)[:&lt;/SPAN&gt;&lt;SPAN&gt;-&lt;/SPAN&gt;&lt;SPAN&gt;1&lt;/SPAN&gt;&lt;SPAN&gt;])&lt;/SPAN&gt;&lt;/DIV&gt;&lt;BR /&gt;&lt;DIV&gt;&lt;SPAN&gt;repo_info &lt;/SPAN&gt;&lt;SPAN&gt;=&lt;/SPAN&gt; &lt;SPAN&gt;list&lt;/SPAN&gt;&lt;SPAN&gt;(w.repos.&lt;/SPAN&gt;&lt;SPAN&gt;list&lt;/SPAN&gt;&lt;SPAN&gt;(&lt;/SPAN&gt;&lt;SPAN&gt;path_prefix&lt;/SPAN&gt; &lt;SPAN&gt;=&lt;/SPAN&gt;&lt;SPAN&gt; prefix_path))[&lt;/SPAN&gt;&lt;SPAN&gt;0&lt;/SPAN&gt;&lt;SPAN&gt;]&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&amp;nbsp;&lt;/DIV&gt;&lt;DIV&gt;&lt;DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;repo_info.head_commit_id&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&amp;nbsp;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;&lt;A href="https://databricks-sdk-py.readthedocs.io/en/latest/" target="_blank"&gt;Databricks SDK for Python (Beta) — Databricks SDK for Python beta documentation&lt;/A&gt;&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;&lt;A href="https://databricks-sdk-py.readthedocs.io/en/latest/workspace/workspace/repos.html" target="_blank"&gt;w.repos: Repos — Databricks SDK for Python beta documentation&lt;/A&gt;&lt;/SPAN&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;</description>
      <pubDate>Thu, 20 Feb 2025 15:09:13 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/how-to-get-git-commit-id-of-the-repository-the-script-runs-on/m-p/110762#M43678</guid>
      <dc:creator>rich_avery</dc:creator>
      <dc:date>2025-02-20T15:09:13Z</dc:date>
    </item>
    <item>
      <title>Re: How to get git commit ID of the repository the script runs on?</title>
      <link>https://community.databricks.com/t5/data-engineering/how-to-get-git-commit-id-of-the-repository-the-script-runs-on/m-p/111955#M44055</link>
      <description>&lt;P&gt;This should be the accepted answer.&lt;/P&gt;&lt;P&gt;A bit shorter version:&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;LI-CODE lang="python"&gt;import os
from databricks.sdk import WorkspaceClient
w = WorkspaceClient()
repo_info = next(w.repos.list(path_prefix=os.getcwd()))
print(repo_info.head_commit_id)&lt;/LI-CODE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Fri, 07 Mar 2025 02:38:02 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/how-to-get-git-commit-id-of-the-repository-the-script-runs-on/m-p/111955#M44055</guid>
      <dc:creator>vr</dc:creator>
      <dc:date>2025-03-07T02:38:02Z</dc:date>
    </item>
    <item>
      <title>Re: How to get git commit ID of the repository the script runs on?</title>
      <link>https://community.databricks.com/t5/data-engineering/how-to-get-git-commit-id-of-the-repository-the-script-runs-on/m-p/115665#M45150</link>
      <description>&lt;P&gt;Here is a version of &lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/42241"&gt;@vr&lt;/a&gt; 's solution that can be run from any folder within the rep. It uses regex to extract the root from the path in the form of \Repos\&amp;lt;username&amp;gt;\&amp;lt;some-repo:&lt;/P&gt;&lt;LI-CODE lang="python"&gt;import os
import re
from databricks.sdk import WorkspaceClient
w = WorkspaceClient()
repo_root = re.search(r'\/Repos\/[^\/]+\/[^\/]+', os.getcwd()).group(0)
repo_info = next(w.repos.list(path_prefix=repo_root))
print(repo_info.head_commit_id)&lt;/LI-CODE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Wed, 16 Apr 2025 14:54:33 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/how-to-get-git-commit-id-of-the-repository-the-script-runs-on/m-p/115665#M45150</guid>
      <dc:creator>bestekov</dc:creator>
      <dc:date>2025-04-16T14:54:33Z</dc:date>
    </item>
  </channel>
</rss>

