โ08-11-2023 02:11 PM
I was recently given a credential file to access shared data via delta sharing. I am following the documentation from https://docs.databricks.com/en/data-sharing/read-data-open.html. The documentation wants the contents of the credential file in a folder in DBFS. I would like to use Azure Key Vault instead.
Therefore, instead of using (under "Step 2: Use a notebook to list and read shared tables" in the above URL):
client = delta_sharing.SharingClient(f"/dbfs/<dbfs-path>/config.share")
client.list_all_tables()
I am using:
credentials = dbutils.secrets.get(scope='redacted', key='redacted')
profile = delta_sharing.protocol.DeltaSharingProfile.from_json(credentials)
client = delta_sharing.SharingClient(profile=profile)
client.list_all_tables()
The above works fine. I can list the tables. Now I would like to load a table using Spark. The documentation suggests using
delta_sharing.load_as_spark(f"<profile-path>#<share-name>.<schema-name>.<table-name>", version=<version-as-of>)
But that relies on having stored the contents of the credential file in a folder in DBFS and using that path for <profile-path>. Is there an alternative way to do this with the "profile" variable I am using? By the way, the code is bold instead of formatted in code blocks because I kept getting errors that prevented me from posting.
โ08-21-2023 12:29 AM
Hi, You can create a secret and store the key inside it OR also you can use a local tool to Base64-encode the contents of your JSON key file, create a secret in a Databricks-backed scope and then you can copy & paste the Base64-encoded text into your secret value. After that, you can reference your secret with the following Spark config of your cluster: credentials {{secrets/<scope-name>/<secret-name>}}
Please tag @Debayan with your next comment, which will get me notified. Thanks!
โ08-21-2023 06:57 AM
Hi @Debayan, thanks for your response!
I'm trying to understand your instructions. The content of my credential file is (I've replaced confidential information with "xyz"):
{"shareCredentialsVersion":1, "bearerToken":"xyz", "endpoint":"xyz", "expirationTime":"2023-09-10T04:10:49.277Z"}
I put that content in a secret in a Databricks-backed scope, and can access it:
credentials = dbutils.secrets.get(scope='redacted', key='redacted')
profile = delta_sharing.protocol.DeltaSharingProfile.from_json(credentials)
Now, instead of doing
delta_sharing.load_as_spark(f"<profile-path>#<share-name>.<schema-name>.<table-name>", version=<version-as-of>)
as suggested in the documentation, I was hoping to use my profile variable that I created and use that in place of <profile-path>. Is that possible? I was thinking there has to be a way because the profile variable has the same information in the share.config file. That is,
โ08-21-2023 11:48 PM
Hi, the most feasible way would be to convert the contents of your key file into base64 and only mention the spark config as below:
credentials <base 64 encoded code>
โ08-22-2023 02:21 PM
Hi @Debayan, do you have have some example code you can share?
โ08-23-2023 01:01 AM
Hi, there is no code as such, only in the spark config you have to mention the syntax with the spark config as below:
credentials <base 64 encoded code>
โ08-23-2023 07:00 AM
Hi @Debayan, how do I mention the syntax with the spark config?
โ08-23-2023 11:35 PM
Hi, you can mention something like below with the other spark configs, such as:
spark.hadoop.google.cloud.auth.service.account.enable true spark.hadoop.fs.gs.auth.service.account.email <client-email> spark.hadoop.fs.gs.project.id <project-id>
credentials <base 64 encoded code>
โ07-07-2025 02:15 AM
Hello Alex,
Were you able to fix this issue ? What did you do in this case? I am trying to achieve the same but stuck with same issue
Passionate about hosting events and connecting people? Help us grow a vibrant local communityโsign up today to get started!
Sign Up Now