cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Adding JAR from Azure DevOps Artifacts feed to Databricks job

yannickmo
New Contributor III

Hello,

We have some Scala code which is compiled and published to an Azure DevOps Artifacts feed.

The issue is we're trying to now add this JAR to a Databricks job (through Terraform) to automate the creation.

To do this I'm trying to authenticate using a generated token but am now getting an error when trying to add the library:

Run result unavailable: job failed with error message
 Library installation failed for library due to user error for maven {
  coordinates: "groupId:artifactId:version"
  repo: "https://user:token@ORG.pkgs.visualstudio.com/FEED_ID/_packaging/FEED_NAME/maven/v1"
}

Any idea how to set this up properly? I know people have figured this out for PyPi but haven't found anything for Maven.

The other option is putting it on ADLS and adding it from there, but I'd prefer direct integration without a middle-man.

1 ACCEPTED SOLUTION

Accepted Solutions

alexott
Databricks Employee
Databricks Employee

As of right now, Databricks can't use non-public Maven repositories as resolving of the maven coordinates happens in the control plane. That's different from the R & Python libraries. As workaround you may try to install libraries via init script or upload to ADLS or S3, and refer by URL from the cluster config

View solution in original post

7 REPLIES 7

Hey thanks for the welcome and likewise!

I do get that using Scala isn't as popular as Python for Databricks and using JARs to assemble jobs makes it even more special of a use-case, but if this is possible it would align nicely for automation.

Hubert-Dudek
Esteemed Contributor III

In past it was said that databrics doesn't support internal maven libraries. Now I don't see anymore that words in documentation but I guess is still like that.

only idea which I have is to use Azure Pipelines +

databricks libraries install --cluster-id 1234-567890-lest123 --jar dbfs:/test-dir/test.jar

Hey thanks for the reply!

Yeah I read the same about internal maven libraries, but I'm not sure what "internal" means in this context. If I try to resolve the repository URL using user:token locally it authenticates perfectly fine from a public address.

The problem is that the error given by Databricks is the same even when you remove the user:token from the URL so you don't get any feedback what's going wrong.

Your idea is indeed also a possible solution but it's similar to putting it on ADLS and installing from there.

yannickmo
New Contributor III

Any other ideas?

alexott
Databricks Employee
Databricks Employee

As of right now, Databricks can't use non-public Maven repositories as resolving of the maven coordinates happens in the control plane. That's different from the R & Python libraries. As workaround you may try to install libraries via init script or upload to ADLS or S3, and refer by URL from the cluster config

@Alex Ott​ do you have an example of an init script that can copy the jar file from ADLS to /databricks/jars? I cannot seem to get connected to ADLS over abfss nor https from an init script.

It's two different things:

  • installing via init script - in it you can just do
cp /dbfs/FileStore/jars/name1.jar /databricks/jars/
  • Installing library from ADLS - just specify ADLS URL in the cluster UI, and make sure that cluster has service principal attached so the file could be downloaded

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group