cancel
Showing results for 
Search instead for 
Did you mean: 
Get Started Discussions
cancel
Showing results for 
Search instead for 
Did you mean: 

Install python package from private repo [CodeArtifact]

pablobd
Contributor

As part of my MLOps stack, I have developed a few packages which are the published to a private AWS CodeArtifact repo. How can I connect the AWS CodeArtifact repo to databricks? I want to be able to add these packages to the requirements.txt of a model logged in the model registry. Is it possible?
Thanks in advance

3 REPLIES 3

pablobd
Contributor

One way to do it is to run this line before installing the dependencies:
pip config set site.index-url https://aws:$CODEARTIFACT_AUTH_TOKEN@my_domain-111122223333.d.codeartifact.region.amazonaws.com/pypi...

But can we add this in MLFlow?

Kaniz
Community Manager
Community Manager

Hi @pablobd, Hi, thank you for your question. I’m happy to help you with connecting your AWS CodeArtifact repo to Databricks. 😊

 

There are a few steps you need to follow to achieve this:

  • First, you need to create an HTTPS Git credential in AWS CodeCommit that allows access to your private repo. You can do this by following the instructions in the AWS CodeCommit documentation. The associated IAM user must have “read” and “write” permissions for the repository. You also need to record the password, as you will enter it in Databricks later.
  • Second, you need to configure a remote repo in Databricks that points to your private repo using the HTTPS Git credential. You can do this by following the instructions in the Databricks documentation. You will need to enter the repository URL, which should look something like this: https://<aws-account-id>.dkr.ecr.<region>.amazonaws.com/<repository-name>. You will also need to enter the username and password that you created in AWS CodeCommit.
  • Third, you need to install the packages from your private repo using pip or another package manager. You can do this by adding a line like this before installing the dependencies: pip config set site.index-url https://<aws-account-id>:<access-token>@<region>-111122223333.d.codeartifact.region.amazonaws.com/pypi/my_repo/simple/. The <aws-account-id> is your AWS account ID, which you can find in the IAM console. The <access-token> is the token that you generated when creating the HTTPS Git credential in AWS CodeCommit. The <region> is your AWS region.

I hope this helps you with connecting your AWS CodeArtifact repo to Databricks. If you have any further questions, please feel free to ask me. 😊

Got it! Thanks @Kaniz .  That's actually the same I do to configure the repo in my local computer. However, I need to do this in the Model Registry so that when deploying a model in the UC to model serving, it installs the requirements from my private repo. How can I run this pip config command during the deployment so that pip knows where to install the packages from?

Thanks again in advance!