<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Install Python dependency on job cluster from a privately hosted GitLab repository (HTTPS/SSH) in Administration &amp; Architecture</title>
    <link>https://community.databricks.com/t5/administration-architecture/install-python-dependency-on-job-cluster-from-a-privately-hosted/m-p/104507#M2693</link>
    <description>&lt;DIV class=""&gt;Hello,&lt;BR /&gt;We intend to deploy a Databricks workflow based on a Python wheel file which needs to run on a job cluster. There is a dependency declared in pyproject.toml which is another Python project living in a private Gitlab repository. We therefore need to provide secure access to our Gitlab domain to the cluster.&lt;BR /&gt;&lt;BR /&gt;&lt;EM&gt;We do not want to declare the dependency URL in pyproject.toml/requirements.txt in a form containing credentials. The .whl file metadata needs to be devoid of any credentials.&lt;/EM&gt;&lt;BR /&gt;&lt;BR /&gt;However, the library still needs to be declared as a dependency of the main code somehow, for proper downstream dependency management.&lt;/DIV&gt;&lt;DIV class=""&gt;&amp;nbsp;&lt;/DIV&gt;&lt;DIV class=""&gt;There are two repository-specific ways to do this in an automated fashion: Either via a &lt;STRONG&gt;deploy key&lt;/STRONG&gt;, which is a project-specific SSH key, or via a &lt;STRONG&gt;deploy token&lt;/STRONG&gt; for HTTPS access. Both ways would work, provided we can get these credentials to be usable on the nodes. Unfortunately, it is being difficult.&lt;/DIV&gt;&lt;DIV class=""&gt;&amp;nbsp;&lt;/DIV&gt;&lt;DIV class=""&gt;This is how I have proceeded so far to try out both ways (still not super secure, as working with plain-text secrets, but these are first steps):&lt;/DIV&gt;&lt;DIV class=""&gt;&amp;nbsp;&lt;/DIV&gt;&lt;H4&gt;Step 1&lt;/H4&gt;&lt;DIV class=""&gt;I created a deploy key pair and added the public key to the Gitlab project. I also created a deploy token.&lt;/DIV&gt;&lt;DIV class=""&gt;&amp;nbsp;&lt;/DIV&gt;&lt;H4&gt;Step 2&lt;/H4&gt;&lt;DIV class=""&gt;I created a secret scope containing the secrets (private key of deploy key pair and deploy token username/token pair) and configured the job cluster to contain the following environment variables, so they can be used in initialization:&lt;/DIV&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;LI-CODE lang="markup"&gt;MY_DEPLOY_KEY: '{{secrets/&amp;lt;MY_SECRET_SCOPE&amp;gt;/my-deploy-key}}'
MY_DEPLOY_TOKEN: '{{secrets/&amp;lt;MY_SECRET_SCOPE&amp;gt;/my-deploy-token}}'
MY_DEPLOY_TOKEN_USER: '{{secrets/&amp;lt;MY_SECRET_SCOPE&amp;gt;/my-deploy-token-user}}'&lt;/LI-CODE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;H4&gt;Step 3a (SSH)&lt;/H4&gt;&lt;DIV class=""&gt;The job cluster init script configures SSH to use the MY_DEPLOY_KEY variable for our Gitlab:&lt;/DIV&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;LI-CODE lang="markup"&gt;mkdir -p /root/.ssh/gitlab
echo "${MY_DEPLOY_KEY}" &amp;gt; /root/.ssh/gitlab/id_rsa  # Add private key
ssh-keyscan gitlab &amp;gt; /root/.ssh/known_hosts  # Get host keys from server

# Configure SSH to use the private key for Gitlab:
cat &amp;lt;&amp;lt; EOL &amp;gt; /root/.ssh/config
Host gitlab
  HostName &amp;lt;OUR_GITLAB_HOSTNAME&amp;gt;
  GlobalKnownHostsFile=/root/.ssh/known_hosts
  User git
  IdentityFile /root/.ssh/gitlab/id_rsa
EOL
chmod 600 /root/.ssh/gitlab/id_rsa /root/.ssh/known_hosts /root/.ssh/config&lt;/LI-CODE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;DIV class=""&gt;&amp;nbsp;&lt;/DIV&gt;&lt;DIV class=""&gt;When running this in a cell on an interactive cluster, this works, I can access the repository. It also works on my local computer. The job cluster however fails to install the dependency from the repository because the host cannot be verified. This is the log4j output:&lt;/DIV&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;LI-CODE lang="markup"&gt;Host key verification failed.
  fatal: Could not read from remote repository.

  Please make sure you have the correct access rights
  and the repository exists.
  error: subprocess-exited-with-error

× git clone --filter=blob:none --quiet 'ssh://****@&amp;lt;OUR_GITLAB_DOMAIN&amp;gt;/&amp;lt;...&amp;gt;.git' /tmp/pip-install-dpz6wll5/&amp;lt;...&amp;gt; did not run successfully.
  │ exit code: 128
  ╰─&amp;gt; See above for output.&lt;/LI-CODE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;H4&gt;Step 3b (HTTPS)&lt;/H4&gt;&lt;DIV class=""&gt;The job cluster init script configures Git to use the MY_DEPLOY_TOKEN_USER and MY_DEPLOY_TOKEN variables for Gitlab:&lt;/DIV&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;LI-CODE lang="markup"&gt;git config --global --add credential.helper store
cat &amp;lt;&amp;lt; EOL &amp;gt; /root/.git-credentials
https://${MY_DEPLOY_TOKEN_USER}:${MY_DEPLOY_TOKEN}@&amp;lt;OUR_GITLAB_HOSTNAME&amp;gt;
EOL
chmod 600 /root/.git-credentials&lt;/LI-CODE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;DIV class=""&gt;Again, this works when executed on an interactive cluster, but on a job cluster, the library installation fails. Log4j output:&lt;/DIV&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;LI-CODE lang="markup"&gt;fatal: could not read Username for 'https://&amp;lt;OUR_GITLAB_HOSTNAME&amp;gt;': No such device or address
  error: subprocess-exited-with-error&lt;/LI-CODE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;H4&gt;Step 4 (further checks)&lt;/H4&gt;&lt;P&gt;I also performed the following checks:&lt;/P&gt;&lt;UL&gt;&lt;LI&gt;The files created by the init scripts do not get deleted after the script finishes executing, so the environment does not get "refreshed" between init script execution and wheel file attachment to the job.&lt;/LI&gt;&lt;LI&gt;Installing the libraries directly from inside the init scripts works when setting up the credentials in these two ways. However, this is not how we want to do it.&lt;/LI&gt;&lt;LI&gt;Installing the libraries when the main code is already running also works (e.g. running subprocess(pip install ...)).&lt;/LI&gt;&lt;/UL&gt;&lt;DIV class=""&gt;A different method I thought of but did not test yet is the package registry feature offered by Gitlab. A built artifact can be registered there and one could create a pip.conf file on the node with the registry URL and its credentials as an extra URL. However, I doubt that this works, because under the hood, I guess the same commands are executed.&lt;/DIV&gt;&lt;DIV class=""&gt;&amp;nbsp;&lt;/DIV&gt;&lt;DIV class=""&gt;&lt;STRONG&gt;Question:&lt;/STRONG&gt;&lt;/DIV&gt;&lt;DIV class=""&gt;What is happening between init script execution and wheel file attachment which could block the access to stored Git credentials on a job cluster?&lt;/DIV&gt;&lt;DIV class=""&gt;What are best practices for securely accessing private repositories on job clusters?&lt;/DIV&gt;</description>
    <pubDate>Tue, 07 Jan 2025 12:40:25 GMT</pubDate>
    <dc:creator>andre-h</dc:creator>
    <dc:date>2025-01-07T12:40:25Z</dc:date>
    <item>
      <title>Install Python dependency on job cluster from a privately hosted GitLab repository (HTTPS/SSH)</title>
      <link>https://community.databricks.com/t5/administration-architecture/install-python-dependency-on-job-cluster-from-a-privately-hosted/m-p/104507#M2693</link>
      <description>&lt;DIV class=""&gt;Hello,&lt;BR /&gt;We intend to deploy a Databricks workflow based on a Python wheel file which needs to run on a job cluster. There is a dependency declared in pyproject.toml which is another Python project living in a private Gitlab repository. We therefore need to provide secure access to our Gitlab domain to the cluster.&lt;BR /&gt;&lt;BR /&gt;&lt;EM&gt;We do not want to declare the dependency URL in pyproject.toml/requirements.txt in a form containing credentials. The .whl file metadata needs to be devoid of any credentials.&lt;/EM&gt;&lt;BR /&gt;&lt;BR /&gt;However, the library still needs to be declared as a dependency of the main code somehow, for proper downstream dependency management.&lt;/DIV&gt;&lt;DIV class=""&gt;&amp;nbsp;&lt;/DIV&gt;&lt;DIV class=""&gt;There are two repository-specific ways to do this in an automated fashion: Either via a &lt;STRONG&gt;deploy key&lt;/STRONG&gt;, which is a project-specific SSH key, or via a &lt;STRONG&gt;deploy token&lt;/STRONG&gt; for HTTPS access. Both ways would work, provided we can get these credentials to be usable on the nodes. Unfortunately, it is being difficult.&lt;/DIV&gt;&lt;DIV class=""&gt;&amp;nbsp;&lt;/DIV&gt;&lt;DIV class=""&gt;This is how I have proceeded so far to try out both ways (still not super secure, as working with plain-text secrets, but these are first steps):&lt;/DIV&gt;&lt;DIV class=""&gt;&amp;nbsp;&lt;/DIV&gt;&lt;H4&gt;Step 1&lt;/H4&gt;&lt;DIV class=""&gt;I created a deploy key pair and added the public key to the Gitlab project. I also created a deploy token.&lt;/DIV&gt;&lt;DIV class=""&gt;&amp;nbsp;&lt;/DIV&gt;&lt;H4&gt;Step 2&lt;/H4&gt;&lt;DIV class=""&gt;I created a secret scope containing the secrets (private key of deploy key pair and deploy token username/token pair) and configured the job cluster to contain the following environment variables, so they can be used in initialization:&lt;/DIV&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;LI-CODE lang="markup"&gt;MY_DEPLOY_KEY: '{{secrets/&amp;lt;MY_SECRET_SCOPE&amp;gt;/my-deploy-key}}'
MY_DEPLOY_TOKEN: '{{secrets/&amp;lt;MY_SECRET_SCOPE&amp;gt;/my-deploy-token}}'
MY_DEPLOY_TOKEN_USER: '{{secrets/&amp;lt;MY_SECRET_SCOPE&amp;gt;/my-deploy-token-user}}'&lt;/LI-CODE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;H4&gt;Step 3a (SSH)&lt;/H4&gt;&lt;DIV class=""&gt;The job cluster init script configures SSH to use the MY_DEPLOY_KEY variable for our Gitlab:&lt;/DIV&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;LI-CODE lang="markup"&gt;mkdir -p /root/.ssh/gitlab
echo "${MY_DEPLOY_KEY}" &amp;gt; /root/.ssh/gitlab/id_rsa  # Add private key
ssh-keyscan gitlab &amp;gt; /root/.ssh/known_hosts  # Get host keys from server

# Configure SSH to use the private key for Gitlab:
cat &amp;lt;&amp;lt; EOL &amp;gt; /root/.ssh/config
Host gitlab
  HostName &amp;lt;OUR_GITLAB_HOSTNAME&amp;gt;
  GlobalKnownHostsFile=/root/.ssh/known_hosts
  User git
  IdentityFile /root/.ssh/gitlab/id_rsa
EOL
chmod 600 /root/.ssh/gitlab/id_rsa /root/.ssh/known_hosts /root/.ssh/config&lt;/LI-CODE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;DIV class=""&gt;&amp;nbsp;&lt;/DIV&gt;&lt;DIV class=""&gt;When running this in a cell on an interactive cluster, this works, I can access the repository. It also works on my local computer. The job cluster however fails to install the dependency from the repository because the host cannot be verified. This is the log4j output:&lt;/DIV&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;LI-CODE lang="markup"&gt;Host key verification failed.
  fatal: Could not read from remote repository.

  Please make sure you have the correct access rights
  and the repository exists.
  error: subprocess-exited-with-error

× git clone --filter=blob:none --quiet 'ssh://****@&amp;lt;OUR_GITLAB_DOMAIN&amp;gt;/&amp;lt;...&amp;gt;.git' /tmp/pip-install-dpz6wll5/&amp;lt;...&amp;gt; did not run successfully.
  │ exit code: 128
  ╰─&amp;gt; See above for output.&lt;/LI-CODE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;H4&gt;Step 3b (HTTPS)&lt;/H4&gt;&lt;DIV class=""&gt;The job cluster init script configures Git to use the MY_DEPLOY_TOKEN_USER and MY_DEPLOY_TOKEN variables for Gitlab:&lt;/DIV&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;LI-CODE lang="markup"&gt;git config --global --add credential.helper store
cat &amp;lt;&amp;lt; EOL &amp;gt; /root/.git-credentials
https://${MY_DEPLOY_TOKEN_USER}:${MY_DEPLOY_TOKEN}@&amp;lt;OUR_GITLAB_HOSTNAME&amp;gt;
EOL
chmod 600 /root/.git-credentials&lt;/LI-CODE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;DIV class=""&gt;Again, this works when executed on an interactive cluster, but on a job cluster, the library installation fails. Log4j output:&lt;/DIV&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;LI-CODE lang="markup"&gt;fatal: could not read Username for 'https://&amp;lt;OUR_GITLAB_HOSTNAME&amp;gt;': No such device or address
  error: subprocess-exited-with-error&lt;/LI-CODE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;H4&gt;Step 4 (further checks)&lt;/H4&gt;&lt;P&gt;I also performed the following checks:&lt;/P&gt;&lt;UL&gt;&lt;LI&gt;The files created by the init scripts do not get deleted after the script finishes executing, so the environment does not get "refreshed" between init script execution and wheel file attachment to the job.&lt;/LI&gt;&lt;LI&gt;Installing the libraries directly from inside the init scripts works when setting up the credentials in these two ways. However, this is not how we want to do it.&lt;/LI&gt;&lt;LI&gt;Installing the libraries when the main code is already running also works (e.g. running subprocess(pip install ...)).&lt;/LI&gt;&lt;/UL&gt;&lt;DIV class=""&gt;A different method I thought of but did not test yet is the package registry feature offered by Gitlab. A built artifact can be registered there and one could create a pip.conf file on the node with the registry URL and its credentials as an extra URL. However, I doubt that this works, because under the hood, I guess the same commands are executed.&lt;/DIV&gt;&lt;DIV class=""&gt;&amp;nbsp;&lt;/DIV&gt;&lt;DIV class=""&gt;&lt;STRONG&gt;Question:&lt;/STRONG&gt;&lt;/DIV&gt;&lt;DIV class=""&gt;What is happening between init script execution and wheel file attachment which could block the access to stored Git credentials on a job cluster?&lt;/DIV&gt;&lt;DIV class=""&gt;What are best practices for securely accessing private repositories on job clusters?&lt;/DIV&gt;</description>
      <pubDate>Tue, 07 Jan 2025 12:40:25 GMT</pubDate>
      <guid>https://community.databricks.com/t5/administration-architecture/install-python-dependency-on-job-cluster-from-a-privately-hosted/m-p/104507#M2693</guid>
      <dc:creator>andre-h</dc:creator>
      <dc:date>2025-01-07T12:40:25Z</dc:date>
    </item>
    <item>
      <title>Re: Install Python dependency on job cluster from a privately hosted GitLab repository (HTTPS/SSH)</title>
      <link>https://community.databricks.com/t5/administration-architecture/install-python-dependency-on-job-cluster-from-a-privately-hosted/m-p/104510#M2695</link>
      <description>&lt;P class="_1t7bu9h1 paragraph"&gt;Between the execution of the init script and the wheel file attachment on a job cluster, there are several factors that could block access to stored Git credentials:&lt;/P&gt;
&lt;OL&gt;
&lt;LI&gt;
&lt;P class="_1t7bu9h1 paragraph"&gt;&lt;STRONG&gt;Environment Isolation&lt;/STRONG&gt;: Job clusters are designed to be ephemeral and isolated. This means that any environment setup done in the init script might not persist or be accessible when the job runs. This isolation ensures that each job runs in a clean environment, which can lead to the loss of any temporary configurations or credentials set up during the init script execution.&lt;/P&gt;
&lt;/LI&gt;
&lt;LI&gt;
&lt;P class="_1t7bu9h1 paragraph"&gt;&lt;STRONG&gt;Credential Storage&lt;/STRONG&gt;: The credentials set up in the init script might not be stored in a way that they are accessible to the job tasks. For example, if the credentials are written to a file, the job tasks might not have the necessary permissions or paths to access these files.&lt;/P&gt;
&lt;/LI&gt;
&lt;LI&gt;
&lt;P class="_1t7bu9h1 paragraph"&gt;&lt;STRONG&gt;Network Configuration&lt;/STRONG&gt;: The network configuration on job clusters might be different from interactive clusters. This can affect the ability to verify host keys or access external repositories, leading to issues like "Host key verification failed."&lt;/P&gt;
&lt;/LI&gt;
&lt;LI&gt;
&lt;P class="_1t7bu9h1 paragraph"&gt;&lt;STRONG&gt;Security Policies&lt;/STRONG&gt;: Job clusters might have stricter security policies that prevent the use of certain credentials or access methods. This can include restrictions on SSH keys or HTTPS tokens, leading to failures in accessing private repositories.&lt;/P&gt;
&lt;/LI&gt;
&lt;/OL&gt;
&lt;P class="_1t7bu9h1 paragraph"&gt;&lt;SPAN&gt;&lt;STRONG&gt;Best Practices for Securely Accessing Private Repositories on Job Clusters&lt;/STRONG&gt;:&lt;/SPAN&gt;&lt;/P&gt;
&lt;OL&gt;
&lt;LI&gt;
&lt;P class="_1t7bu9h1 paragraph"&gt;&lt;STRONG&gt;Use Databricks Secrets&lt;/STRONG&gt;: Store your Git credentials (SSH keys or HTTPS tokens) in Databricks Secrets. This ensures that the credentials are securely managed and can be accessed by the job tasks without being exposed in the init scripts.&lt;/P&gt;
&lt;/LI&gt;
&lt;LI&gt;
&lt;P class="_1t7bu9h1 paragraph"&gt;&lt;STRONG&gt;Environment Variables&lt;/STRONG&gt;: Use environment variables to pass credentials to the job tasks. This can be done by setting the environment variables in the init script and ensuring that the job tasks are configured to read these variables.&lt;/P&gt;
&lt;/LI&gt;
&lt;LI&gt;
&lt;P class="_1t7bu9h1 paragraph"&gt;&lt;STRONG&gt;Databricks Repos&lt;/STRONG&gt;: Use Databricks Repos to manage your code. Databricks Repos integrates with Git providers and handles the authentication and access management, reducing the need to manually manage credentials.&lt;/P&gt;
&lt;/LI&gt;
&lt;LI&gt;
&lt;P class="_1t7bu9h1 paragraph"&gt;&lt;STRONG&gt;Cluster Policies&lt;/STRONG&gt;: Define cluster policies that ensure the necessary configurations and credentials are set up correctly for job clusters. This can help enforce consistent and secure access to private repositories.&lt;/P&gt;
&lt;/LI&gt;
&lt;LI&gt;
&lt;P class="_1t7bu9h1 paragraph"&gt;&lt;STRONG&gt;Package Registry&lt;/STRONG&gt;: Consider using a package registry feature offered by GitLab. You can register built artifacts and create a &lt;CODE&gt;pip.conf&lt;/CODE&gt; file on the node with the registry URL and its credentials as an extra URL. This method can help manage dependencies more securely and efficiently.&lt;/P&gt;
&lt;/LI&gt;
&lt;/OL&gt;</description>
      <pubDate>Tue, 07 Jan 2025 16:13:28 GMT</pubDate>
      <guid>https://community.databricks.com/t5/administration-architecture/install-python-dependency-on-job-cluster-from-a-privately-hosted/m-p/104510#M2695</guid>
      <dc:creator>Walter_C</dc:creator>
      <dc:date>2025-01-07T16:13:28Z</dc:date>
    </item>
    <item>
      <title>Re: Install Python dependency on job cluster from a privately hosted GitLab repository (HTTPS/SSH)</title>
      <link>https://community.databricks.com/t5/administration-architecture/install-python-dependency-on-job-cluster-from-a-privately-hosted/m-p/104545#M2699</link>
      <description>&lt;BLOCKQUOTE&gt;&lt;HR /&gt;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/88823"&gt;@Walter_C&lt;/a&gt;&amp;nbsp;wrote:&lt;BR /&gt;&lt;OL&gt;&lt;LI&gt;&lt;P class=""&gt;&lt;STRONG&gt;Environment Isolation&lt;/STRONG&gt;: Job clusters are designed to be ephemeral and isolated. This means that any environment setup done in the init script might not persist or be accessible when the job runs. This isolation ensures that each job runs in a clean environment, which can lead to the loss of any temporary configurations or credentials set up during the init script execution.&lt;/P&gt;&lt;/LI&gt;&lt;/OL&gt;&lt;HR /&gt;&lt;/BLOCKQUOTE&gt;&lt;P&gt;I think that this is the actual cause, but it would be great to get a deterministic statement regarding this.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Tue, 07 Jan 2025 15:29:06 GMT</pubDate>
      <guid>https://community.databricks.com/t5/administration-architecture/install-python-dependency-on-job-cluster-from-a-privately-hosted/m-p/104545#M2699</guid>
      <dc:creator>andre-h</dc:creator>
      <dc:date>2025-01-07T15:29:06Z</dc:date>
    </item>
    <item>
      <title>Re: Install Python dependency on job cluster from a privately hosted GitLab repository (HTTPS/SSH)</title>
      <link>https://community.databricks.com/t5/administration-architecture/install-python-dependency-on-job-cluster-from-a-privately-hosted/m-p/104759#M2715</link>
      <description>&lt;P&gt;Hi&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/140834"&gt;@andre-h&lt;/a&gt;&amp;nbsp;,&lt;/P&gt;&lt;P&gt;As a good alternative you can build the python package (wheel or egg) in your gitlab or github workflows and upload the package to dedicated cloud storage bucket. Then followed by you can specify the cloud storage path of your python library in Job dependencies which will be dynamically installed in your Job cluster when job triggered.&lt;/P&gt;</description>
      <pubDate>Wed, 08 Jan 2025 18:34:39 GMT</pubDate>
      <guid>https://community.databricks.com/t5/administration-architecture/install-python-dependency-on-job-cluster-from-a-privately-hosted/m-p/104759#M2715</guid>
      <dc:creator>hari-prasad</dc:creator>
      <dc:date>2025-01-08T18:34:39Z</dc:date>
    </item>
    <item>
      <title>Re: Install Python dependency on job cluster from a privately hosted GitLab repository (HTTPS/SSH)</title>
      <link>https://community.databricks.com/t5/administration-architecture/install-python-dependency-on-job-cluster-from-a-privately-hosted/m-p/104939#M2733</link>
      <description>&lt;P&gt;Hi&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/98469"&gt;@hari-prasad&lt;/a&gt;&amp;nbsp;,&lt;/P&gt;&lt;P&gt;Thanks, that sounds like a very good solution as well &lt;span class="lia-unicode-emoji" title=":slightly_smiling_face:"&gt;🙂&lt;/span&gt;&lt;/P&gt;&lt;P&gt;I managed to get it to run by using GitLab package registry as a private pypi and creating a pip.conf file with the credentials for http access in the initialization script. As I wrote, I wouldn't have expected it to work, but apparently&amp;nbsp; this is the only way to make your custom library an integral part of your dependencies.&lt;/P&gt;</description>
      <pubDate>Thu, 09 Jan 2025 12:59:43 GMT</pubDate>
      <guid>https://community.databricks.com/t5/administration-architecture/install-python-dependency-on-job-cluster-from-a-privately-hosted/m-p/104939#M2733</guid>
      <dc:creator>andre-h</dc:creator>
      <dc:date>2025-01-09T12:59:43Z</dc:date>
    </item>
  </channel>
</rss>

