cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Spark readStream kafka.ssl.keystore.location abfss path

mwoods
New Contributor III

Similar to https://community.databricks.com/t5/data-engineering/kafka-unable-to-read-client-keystore-jks/td-p/2...- the documentation (https://learn.microsoft.com/en-gb/azure/databricks/structured-streaming/kafka#use-ssl-to-connect-azu...) recommends that certificates for authenticating with kafka be kept in cloud storage, and the example appears to hint that it should be possible to read directly from that location...but in practice, it appears that spark is unable to read from abfss paths directly.

Setting kafka.ssl.keystore.location and kafka.ssl.truststore.location to abfss paths, for me, results in:

kafkashaded.org.apache.kafka.common.KafkaException: Failed to create new KafkaAdminClient
...
Caused by: kafkashaded.org.apache.kafka.common.KafkaException: Failed to load SSL keystore abfss://{container}@{account}.dfs.core.windows.net/client.keystore.p12 of type PKCS12
...
Caused by: java.nio.file.NoSuchFileException: abfss:/{container}@{account}.dfs.core.windows.net/client.keystore.p12

paths have been double-checked as correct, and the account has read permission granted to the external location.

Can we get confirmation that reading direct is not possible, and if the recommendation is to copy the file(s) to a local temp path first, and referencing those paths in the kafka.ssl.*.location config options? Or should it be possible to read directly from abfss paths?

1 ACCEPTED SOLUTION

Accepted Solutions

mwoods
New Contributor III

@Retired_mod- quick update - managed to find the cause. It's neither of the above, it's a bug in the DataBricks 14.0 runtime. I had switched back to the 13.3 LTS runtime, and that is what caused the error to disappear.

As soon as I try to read directly from abfss paths using a compute resource with the DBR 14.0 runtime, I get the error again, so it appears a bug has been introduced in that release.

View solution in original post

2 REPLIES 2

mwoods
New Contributor III

Hi @Retired_mod - thanks for your response.

Not sure what's happened here...this is now working for me, so either the issue has been patched, or the issue was somehow related to my group management where the external location read permissions were mapped to a "Data Engineer" group that existed at the account level, but was not actually being properly mapped via databricks_mws_permission_assignment to the respective workspace (which I have now rectified).

...I think that's probably the case, though I'm not sure in that situation why I was able to copy the file successfully via dbutils.fs.cp as a workaround until now (as that seemed to imply that I did have access to read the file).

mwoods
New Contributor III

@Retired_mod- quick update - managed to find the cause. It's neither of the above, it's a bug in the DataBricks 14.0 runtime. I had switched back to the 13.3 LTS runtime, and that is what caused the error to disappear.

As soon as I try to read directly from abfss paths using a compute resource with the DBR 14.0 runtime, I get the error again, so it appears a bug has been introduced in that release.

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group