cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
cancel
Showing results for 
Search instead for 
Did you mean: 

Spark readStream kafka.ssl.keystore.location abfss path

mwoods
New Contributor III

Similar to https://community.databricks.com/t5/data-engineering/kafka-unable-to-read-client-keystore-jks/td-p/2...- the documentation (https://learn.microsoft.com/en-gb/azure/databricks/structured-streaming/kafka#use-ssl-to-connect-azu...) recommends that certificates for authenticating with kafka be kept in cloud storage, and the example appears to hint that it should be possible to read directly from that location...but in practice, it appears that spark is unable to read from abfss paths directly.

Setting kafka.ssl.keystore.location and kafka.ssl.truststore.location to abfss paths, for me, results in:

kafkashaded.org.apache.kafka.common.KafkaException: Failed to create new KafkaAdminClient
...
Caused by: kafkashaded.org.apache.kafka.common.KafkaException: Failed to load SSL keystore abfss://{container}@{account}.dfs.core.windows.net/client.keystore.p12 of type PKCS12
...
Caused by: java.nio.file.NoSuchFileException: abfss:/{container}@{account}.dfs.core.windows.net/client.keystore.p12

paths have been double-checked as correct, and the account has read permission granted to the external location.

Can we get confirmation that reading direct is not possible, and if the recommendation is to copy the file(s) to a local temp path first, and referencing those paths in the kafka.ssl.*.location config options? Or should it be possible to read directly from abfss paths?

1 ACCEPTED SOLUTION

Accepted Solutions

mwoods
New Contributor III

@Kaniz- quick update - managed to find the cause. It's neither of the above, it's a bug in the DataBricks 14.0 runtime. I had switched back to the 13.3 LTS runtime, and that is what caused the error to disappear.

As soon as I try to read directly from abfss paths using a compute resource with the DBR 14.0 runtime, I get the error again, so it appears a bug has been introduced in that release.

View solution in original post

3 REPLIES 3

Kaniz
Community Manager
Community Manager

Hi @mwoodsBased on the provided information, it seems like there might be some issues with reading directly from abfss paths when setting kafka.ssl.keystore.location and kafka.ssl.truststore.location to abfss paths.

The errors you are encountering suggest that there may be issues with the permissions or the configuration of the storage credentials.

Here are some steps you could take to troubleshoot this issue:

1. Check if the table is being accessed from a UC-enabled cluster.
2. Test if the connection for the table configured external location shows READ FILE successful in the test connection.
3. Provide READ file access to storage credentials if missing. You can do this by running a command like GRANT READ FILES ON STORAGE CREDENTIAL my_aws_storage_cred TO ceo;.

If these steps do not resolve the issue, it might be necessary to copy the files to a local temp path first and reference those paths in the kafka.ssl.*.location config options.

mwoods
New Contributor III

Hi @Kaniz - thanks for your response.

Not sure what's happened here...this is now working for me, so either the issue has been patched, or the issue was somehow related to my group management where the external location read permissions were mapped to a "Data Engineer" group that existed at the account level, but was not actually being properly mapped via databricks_mws_permission_assignment to the respective workspace (which I have now rectified).

...I think that's probably the case, though I'm not sure in that situation why I was able to copy the file successfully via dbutils.fs.cp as a workaround until now (as that seemed to imply that I did have access to read the file).

mwoods
New Contributor III

@Kaniz- quick update - managed to find the cause. It's neither of the above, it's a bug in the DataBricks 14.0 runtime. I had switched back to the 13.3 LTS runtime, and that is what caused the error to disappear.

As soon as I try to read directly from abfss paths using a compute resource with the DBR 14.0 runtime, I get the error again, so it appears a bug has been introduced in that release.

Welcome to Databricks Community: Lets learn, network and celebrate together

Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections. 

Click here to register and join today! 

Engage in exciting technical discussions, join a group with your peers and meet our Featured Members.