Not Able To Access GCP storage bucket from Databricks
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
2 weeks ago
While running :
Getting error : java.io.IOException: Invalid PKCS8 data.
Cluster Spark Config :
spark.hadoop.fs.gs.auth.service.account.private.key.id {{secrets/newscope/gsaprivatekeyid}} spark.hadoop.fs.gs.auth.service.account.private.key {{secrets/newscope/gsaprivatekeynew}} spark.hadoop.google.cloud.auth.service.account.enable true spark.hadoop.fs.gs.project.id <projectid> spark.hadoop.fs.gs.auth.service.account.email <email id>
Have followed the document : Connect to Google Cloud Storage - Azure Databricks | Microsoft Learn
Please Help
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
2 weeks ago
Troubleshooting and Resolution for java.io.IOException: Invalid PKCS8 data
java.io.IOException: Invalid PKCS8 data
typically occurs when there is an issue with the private key format or its storage in Databricks secrets. Based on the provided cluster Spark configurations and the referenced document, here are the potential causes and their resolutions:"private_key": "-----BEGIN PRIVATE KEY-----\
MIIEvQI...\
-----END PRIVATE KEY-----\
"
dbutils.secrets.get(scope="newscope", key="gsaprivatekeynew")
dbutils.secrets.get(scope="newscope", key="gsaprivatekeyid")
```
- The secrets should correctly retrieve the values stored without additional whitespace or errors.
Step 3: Confirm Spark Configuration**
- Double-check if the cluster Spark configuration matches the setup described in the document:
- *Service Account Email:* Ensure this matches the email value from your GCP service account JSON.
- *Project ID:* Verify the project ID is correct and matches your GCP project.
Here is the corrected Spark configuration example:
```properties
spark.hadoop.google.cloud.auth.service.account.enable true
spark.hadoop.fs.gs.auth.service.account.email <service-account-email>
spark.hadoop.fs.gs.project.id <project-id>
spark.hadoop.fs.gs.auth.service.account.private.key {{secrets/newscope/gsaprivatekeynew}}
spark.hadoop.fs.gs.auth.service.account.private.key.id {{secrets/newscope/gsaprivatekeyid}}
df = spark.read.format("csv").option("header", "true").load("gs://<bucket-name>/<path>")
df.show()
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
2 weeks ago
@BigRoux can you please suggest .
what value should i store in private key, just the part between begin and end. As I am saving that only still getting error.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
2 weeks ago
what value should i store in private key, just the part between begin and end. As I am saving that only still getting error.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
2 weeks ago
What is the error you are getting? More context is needed here.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
2 weeks ago
same error java.io.IOException: Invalid PKCS8 data.
"private_key": "-----BEGIN PRIVATE KEY-----\n --have stored this value present between these two--\n-----END PRIVATE KEY-----\n",
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
2 weeks ago
Here is an example of a properly formatted and delimited PKCS#8 private key in PEM format. This format includes the required headers and footers:
```
-----BEGIN PRIVATE KEY-----
MIIBVgIBADANBgkqhkiG9w0BAQEFAASCAUAwggE8AgEAAkEAq7BFUpkGp3+LQmlQ
Yx2eqzDV+xeG8kx/sQFV18S5JhzGeIJNA72wSeukEPojtqUyX2J0CciPBh7eqclQ
2zpAswIDAQABAkAgisq4+zRdrzkwH1ITV1vpytnkO/NiHcnePQiOW0VUybPyHoGM
/jf75C5xET7ZQpBe5kx5VHsPZj0CBb3b+wSRAiEA2mPWCBytosIU/ODRfq6EiV04
lt6waE7I2uSPqIC20LcCIQDJQYIHQII+3YaPqyhGgqMexuuuGx+lDKD6/Fu/JwPb
5QIhAKthiYcYKlL9h8bjDsQhZDUACPasjzdsDEdq8inDyLOFAiEAmCr/tZwA3qeA
ZoBzI10DGPIuoKXBd3nk/eBxPkaxlEECIQCNymjsoI7GldtujVnr1qT+3yedLfHK
srDVjIT3LsvTqw==
-----END PRIVATE KEY-----
```
Explanation:
- Headers and Footers: The key begins with `-----BEGIN PRIVATE KEY-----` and ends with `-----END PRIVATE KEY-----`. These delimiters are mandatory in PEM format.
- Base64 Encoding: The content between the headers is the Base64-encoded representation of the private key data.
- Line Breaks: The encoded data is split into lines of 64 characters for readability, though this is not strictly required by all tools.
This format is widely used for storing private keys in PKCS#8 syntax, which supports various cryptographic algorithms.
Further, if you are still encountering problems I would suggest you try using Databricks Secret scopes. This way you don't have to expose a key which is a security anti-pattern.
Cheers, Louis.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
2 weeks ago
@BigRoux after updating the key we are getting different error:
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
2 weeks ago
At this point it is out of my area of knowledge and I don't havey any further suggestions. You may want to consider contacting Databricks Support if you have a support contract.

