cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Connect Databrick's cluster with Artifactory

P10d
New Contributor

Hello,

I'm trying to connect databricks with an own JFrog Artifactory. 

The objective is to download both PIP/JAR dependencies from it instead of connecting to maven-central/PyPi. 

Im struggling with JAR's. 

My aproximation to solve the problem is:

1. Create an init script creating a new trustore with CA where the Artifactory is deployed and saving it in /tmp.

2. Create a new Ivy Settings with de solver's for the artifactory repositories.

3. Configure spark conf in order that it get's everything. The properties setted are:
spark.driver.extraJavaOptions -Djavax.net.ssl.trustStore=With the jks -Djavax.net.ssl.trustStorePassword=changeit
spark.executor.extraJavaOptions -Djavax.net.ssl.trustStore=With the jks -Djavax.net.ssl.trustStorePassword=changeit
spark.databricks.library.ivySettings /Volumes/XXX/init_scripts/ivysettings.xml
spark.jars.packages com.microsoft.azure:azure-eventhubs-spark_2.12:2.3.22

The cluster is a Standard one with 16.4 LTS Runtime. 

If anyone can help I would appreciate it.

Thanks in advance!

 

 

1 REPLY 1

emma_s
Databricks Employee
Databricks Employee

Hi, I haven't got the ability to test this myself but based on some internal research, I think the following is true:

Hi,

The most likely issue is your truststore configuration. Setting spark.driver.extraJavaOptions -Djavax.net.ssl.trustStore=<custom-path> replaces the JVM's entire default truststore rather than extending it. This means the JVM loses all the standard public CAs,
which breaks Ivy/Maven's ability to connect over HTTPS — including to your Artifactory.

Instead of overriding the truststore via Spark conf, add your Artifactory CA to the default Java keystore in your init script:

#!/bin/bash

cat << 'EOF' > /usr/local/share/ca-certificates/artifactory-ca.crt
-----BEGIN CERTIFICATE-----
<your-CA-certificate-chain>
-----END CERTIFICATE-----
EOF

update-ca-certificates

JAVA_HOME=$(readlink -f /usr/bin/java | sed "s:bin/java::")
KEYSTORE="$JAVA_HOME/lib/security/cacerts"

keytool -noprompt -import -trustcacerts \
-alias artifactory-ca \
-keystore $KEYSTORE \
-storepass changeit \
-file /usr/local/share/ca-certificates/artifactory-ca.crt

Then remove the spark.driver.extraJavaOptions and spark.executor.extraJavaOptions truststore settings entirely. The JVM will use the updated default keystore which now has both the standard public CAs and your Artifactory CA.

If this fixes it can you mark as the accepted solution to help others please.

 

Thanks,

Emma