cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Get Started Discussions
Start your journey with Databricks by joining discussions on getting started guides, tutorials, and introductory topics. Connect with beginners and experts alike to kickstart your Databricks experience.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

Install python package from private repo [CodeArtifact]

pablobd
Contributor II

As part of my MLOps stack, I have developed a few packages which are the published to a private AWS CodeArtifact repo. How can I connect the AWS CodeArtifact repo to databricks? I want to be able to add these packages to the requirements.txt of a model logged in the model registry. Is it possible?
Thanks in advance

3 REPLIES 3

pablobd
Contributor II

One way to do it is to run this line before installing the dependencies:
pip config set site.index-url https://aws:$CODEARTIFACT_AUTH_TOKEN@my_domain-111122223333.d.codeartifact.region.amazonaws.com/pypi...

But can we add this in MLFlow?

Hi @pablobd, Hi, thank you for your question. Iโ€™m happy to help you with connecting your AWS CodeArtifact repo to Databricks. ๐Ÿ˜Š

 

There are a few steps you need to follow to achieve this:

  • First, you need to create an HTTPS Git credential in AWS CodeCommit that allows access to your private repo. You can do this by following the instructions in the AWS CodeCommit documentation. The associated IAM user must have โ€œreadโ€ and โ€œwriteโ€ permissions for the repository. You also need to record the password, as you will enter it in Databricks later.
  • Second, you need to configure a remote repo in Databricks that points to your private repo using the HTTPS Git credential. You can do this by following the instructions in the Databricks documentation. You will need to enter the repository URL, which should look something like this: https://<aws-account-id>.dkr.ecr.<region>.amazonaws.com/<repository-name>. You will also need to enter the username and password that you created in AWS CodeCommit.
  • Third, you need to install the packages from your private repo using pip or another package manager. You can do this by adding a line like this before installing the dependencies: pip config set site.index-url https://<aws-account-id>:<access-token>@<region>-111122223333.d.codeartifact.region.amazonaws.com/pypi/my_repo/simple/. The <aws-account-id> is your AWS account ID, which you can find in the IAM console. The <access-token> is the token that you generated when creating the HTTPS Git credential in AWS CodeCommit. The <region> is your AWS region.

I hope this helps you with connecting your AWS CodeArtifact repo to Databricks. If you have any further questions, please feel free to ask me. ๐Ÿ˜Š

Got it! Thanks @Kaniz_Fatma .  That's actually the same I do to configure the repo in my local computer. However, I need to do this in the Model Registry so that when deploying a model in the UC to model serving, it installs the requirements from my private repo. How can I run this pip config command during the deployment so that pip knows where to install the packages from?

Thanks again in advance!

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you wonโ€™t want to miss the chance to attend and share knowledge.

If there isnโ€™t a group near you, start one and help create a community that brings people together.

Request a New Group