10-12-2021 08:01 AM
Hello,
We have some Scala code which is compiled and published to an Azure DevOps Artifacts feed.
The issue is we're trying to now add this JAR to a Databricks job (through Terraform) to automate the creation.
To do this I'm trying to authenticate using a generated token but am now getting an error when trying to add the library:
Run result unavailable: job failed with error message
Library installation failed for library due to user error for maven {
coordinates: "groupId:artifactId:version"
repo: "https://user:token@ORG.pkgs.visualstudio.com/FEED_ID/_packaging/FEED_NAME/maven/v1"
}
Any idea how to set this up properly? I know people have figured this out for PyPi but haven't found anything for Maven.
The other option is putting it on ADLS and adding it from there, but I'd prefer direct integration without a middle-man.
11-25-2021 10:47 AM
As of right now, Databricks can't use non-public Maven repositories as resolving of the maven coordinates happens in the control plane. That's different from the R & Python libraries. As workaround you may try to install libraries via init script or upload to ADLS or S3, and refer by URL from the cluster config
10-13-2021 12:17 AM
Hey thanks for the welcome and likewise!
I do get that using Scala isn't as popular as Python for Databricks and using JARs to assemble jobs makes it even more special of a use-case, but if this is possible it would align nicely for automation.
10-12-2021 08:45 AM
In past it was said that databrics doesn't support internal maven libraries. Now I don't see anymore that words in documentation but I guess is still like that.
only idea which I have is to use Azure Pipelines +
databricks libraries install --cluster-id 1234-567890-lest123 --jar dbfs:/test-dir/test.jar
10-13-2021 12:14 AM
Hey thanks for the reply!
Yeah I read the same about internal maven libraries, but I'm not sure what "internal" means in this context. If I try to resolve the repository URL using user:token locally it authenticates perfectly fine from a public address.
The problem is that the error given by Databricks is the same even when you remove the user:token from the URL so you don't get any feedback what's going wrong.
Your idea is indeed also a possible solution but it's similar to putting it on ADLS and installing from there.
10-18-2021 05:58 AM
Any other ideas?
11-25-2021 10:47 AM
As of right now, Databricks can't use non-public Maven repositories as resolving of the maven coordinates happens in the control plane. That's different from the R & Python libraries. As workaround you may try to install libraries via init script or upload to ADLS or S3, and refer by URL from the cluster config
08-16-2022 08:55 AM
@Alex Ott do you have an example of an init script that can copy the jar file from ADLS to /databricks/jars? I cannot seem to get connected to ADLS over abfss nor https from an init script.
08-19-2022 05:28 AM
It's two different things:
cp /dbfs/FileStore/jars/name1.jar /databricks/jars/
Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.
If there isn’t a group near you, start one and help create a community that brings people together.
Request a New Group