cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

Spark Kafka Client Not Using Certs from Default truststore

Mahtab67
New Contributor

Hi Team, 

I'm working on connecting Databricks to an external Kafka cluster secured with SASL_SSL (SCRAM-SHA-512 + certificate trust). We've encountered an issue where certificates imported into the default JVM truststore (cacerts) via an init script are not picked up by Sparkโ€™s Kafka connector, unless we explicitly create and reference a .jks truststore.
What We've Done:

for N in $(seq 0 $((CERTS - 1))); do
  ALIAS="custom-cert-$N"
  awk "n==$N{print} /END CERTIFICATE/{n++}" "$PEM_FILE" | \
    keytool -noprompt -import -trustcacerts \
    -alias "$ALIAS" -keystore "$KEYSTORE" -storepass "$PASSWORD"
done

This correctly added entries like custom-cert-0 and custom-cert-1 to the cacerts store. We verified them using keytool -list. 

Despite this, .write.format("kafka") failed with SSL handshake errors until we disabled hostname verification with:

 
.option("kafka.ssl.endpoint.identification.algorithm", "")

and we could push messages to kafka topic and read them from databricks notebook but not securely. NOTICE we also used abfss and volume, and both could not resolve our problem. We were able to push and read messages from Kafka this way, but itโ€™s not fully secure. We tried both abfss:// and /mnt volume mounting approaches, but neither allowed us to reliably use the .jks truststore. Since we're also using Unity Catalog, we're restricted from referencing paths under /dbfs, in the code directly which further limits our options.

Any guidance or workaround would be greatly appreciated.

Thank you in advance!

 
1 REPLY 1

lingareddy_Alva
Honored Contributor II

Hi @Mahtab67 

This is a common issue with Databricks and Kafka SSL connectivity.
The problem stems from how Spark's Kafka connector handles SSL context initialization versus the JVM's default truststore.

Root Cause Analysis:
The Spark Kafka connector creates its own SSL context and doesn't automatically inherit certificates from the JVM's default cacerts truststore.
When you disable hostname verification, you're bypassing certificate validation entirely, which explains why it works but isn't secure.

Solution Options
Option 1: JVM System Properties (Recommended)
Set JVM-level SSL properties in your cluster configuration. This forces all SSL connections to use your custom truststore:
Cluster Spark Config:

spark.driver.extraJavaOptions -Djavax.net.ssl.trustStore=/path/to/your/truststore.jks -Djavax.net.ssl.trustStorePassword=your_password -Djavax.net.ssl.trustStoreType=JKS
spark.executor.extraJavaOptions -Djavax.net.ssl.trustStore=/path/to/your/truststore.jks -Djavax.net.ssl.trustStorePassword=your_password -Djavax.net.ssl.trustStoreType=JKS

Option 2: Kafka-Specific SSL Configuration
Instead of relying on the default truststore, explicitly configure Kafka SSL options:

df.write \
.format("kafka") \
.option("kafka.bootstrap.servers", "your-kafka-servers:9093") \
.option("kafka.security.protocol", "SASL_SSL") \
.option("kafka.sasl.mechanism", "SCRAM-SHA-512") \
.option("kafka.sasl.jaas.config", "org.apache.kafka.common.security.scram.ScramLoginModule required username='user' password='pass';") \
.option("kafka.ssl.truststore.location", "/path/to/truststore.jks") \
.option("kafka.ssl.truststore.password", "truststore_password") \
.option("kafka.ssl.truststore.type", "JKS") \
.option("kafka.ssl.endpoint.identification.algorithm", "https") \
.save()

Option 3: Unity Catalog Compatible Approach
Since you're using Unity Catalog, store your truststore in a volume and reference it properly:
1. Create/Upload truststore to Unity Catalog volume:

-- Create volume if not exists
CREATE VOLUME IF NOT EXISTS your_catalog.your_schema.kafka_certs;

2. Upload your .jks file to the volume via UI or:

# Copy from local filesystem to volume
dbutils.fs.cp("file:/tmp/truststore.jks", "/Volumes/your_catalog/your_schema/kafka_certs/truststore.jks")

3. Reference in Kafka configuration:

truststore_path = "/Volumes/your_catalog/your_schema/kafka_certs/truststore.jks"

df.write \
.format("kafka") \
.option("kafka.bootstrap.servers", "your-servers:9093") \
.option("kafka.security.protocol", "SASL_SSL") \
.option("kafka.sasl.mechanism", "SCRAM-SHA-512") \
.option("kafka.sasl.jaas.config", jaas_config) \
.option("kafka.ssl.truststore.location", truststore_path) \
.option("kafka.ssl.truststore.password", truststore_password) \
.save()

Option 4: Init Script with Proper JVM Configuration

Modify your init script to not only import certificates but also set system properties:

#!/bin/bash

# Your existing certificate import logic
for N in $(seq 0 $((CERTS - 1))); do
ALIAS="custom-cert-$N"
awk "n==$N{print} /END CERTIFICATE/{n++}" "$PEM_FILE" | \
keytool -noprompt -import -trustcacerts \
-alias "$ALIAS" -keystore "$KEYSTORE" -storepass "$PASSWORD"
done

# Create a separate truststore for Kafka
KAFKA_TRUSTSTORE="/databricks/driver/kafka-truststore.jks"
cp "$KEYSTORE" "$KAFKA_TRUSTSTORE"

# Set environment variables
echo "export KAFKA_TRUSTSTORE_LOCATION=$KAFKA_TRUSTSTORE" >> /databricks/spark/conf/spark-env.sh
echo "export KAFKA_TRUSTSTORE_PASSWORD=$PASSWORD" >> /databricks/spark/conf/spark-env.sh

Best Practice Recommendation
I recommend Option 3 (Unity Catalog approach) combined with explicit Kafka SSL configuration because:
It's Unity Catalog compliant
Provides explicit control over SSL settings
Maintains security best practices
Is auditable and manageable

 

 

LR

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local communityโ€”sign up today to get started!

Sign Up Now