08-11-2023 02:11 PM
I was recently given a credential file to access shared data via delta sharing. I am following the documentation from https://docs.databricks.com/en/data-sharing/read-data-open.html. The documentation wants the contents of the credential file in a folder in DBFS. I would like to use Azure Key Vault instead.
Therefore, instead of using (under "Step 2: Use a notebook to list and read shared tables" in the above URL):
client = delta_sharing.SharingClient(f"/dbfs/<dbfs-path>/config.share")
client.list_all_tables()
I am using:
credentials = dbutils.secrets.get(scope='redacted', key='redacted')
profile = delta_sharing.protocol.DeltaSharingProfile.from_json(credentials)
client = delta_sharing.SharingClient(profile=profile)
client.list_all_tables()
The above works fine. I can list the tables. Now I would like to load a table using Spark. The documentation suggests using
delta_sharing.load_as_spark(f"<profile-path>#<share-name>.<schema-name>.<table-name>", version=<version-as-of>)
But that relies on having stored the contents of the credential file in a folder in DBFS and using that path for <profile-path>. Is there an alternative way to do this with the "profile" variable I am using? By the way, the code is bold instead of formatted in code blocks because I kept getting errors that prevented me from posting.
08-21-2023 12:29 AM
Hi, You can create a secret and store the key inside it OR also you can use a local tool to Base64-encode the contents of your JSON key file, create a secret in a Databricks-backed scope and then you can copy & paste the Base64-encoded text into your secret value. After that, you can reference your secret with the following Spark config of your cluster: credentials {{secrets/<scope-name>/<secret-name>}}
Please tag @Debayan with your next comment, which will get me notified. Thanks!
08-21-2023 06:57 AM
Hi @Debayan, thanks for your response!
I'm trying to understand your instructions. The content of my credential file is (I've replaced confidential information with "xyz"):
{"shareCredentialsVersion":1, "bearerToken":"xyz", "endpoint":"xyz", "expirationTime":"2023-09-10T04:10:49.277Z"}
I put that content in a secret in a Databricks-backed scope, and can access it:
credentials = dbutils.secrets.get(scope='redacted', key='redacted')
profile = delta_sharing.protocol.DeltaSharingProfile.from_json(credentials)
Now, instead of doing
delta_sharing.load_as_spark(f"<profile-path>#<share-name>.<schema-name>.<table-name>", version=<version-as-of>)
as suggested in the documentation, I was hoping to use my profile variable that I created and use that in place of <profile-path>. Is that possible? I was thinking there has to be a way because the profile variable has the same information in the share.config file. That is,
08-21-2023 11:48 PM
Hi, the most feasible way would be to convert the contents of your key file into base64 and only mention the spark config as below:
credentials <base 64 encoded code>
08-22-2023 02:21 PM
Hi @Debayan, do you have have some example code you can share?
08-23-2023 01:01 AM
Hi, there is no code as such, only in the spark config you have to mention the syntax with the spark config as below:
credentials <base 64 encoded code>
08-23-2023 07:00 AM
Hi @Debayan, how do I mention the syntax with the spark config?
08-23-2023 11:35 PM
Hi, you can mention something like below with the other spark configs, such as:
spark.hadoop.google.cloud.auth.service.account.enable true spark.hadoop.fs.gs.auth.service.account.email <client-email> spark.hadoop.fs.gs.project.id <project-id>
credentials <base 64 encoded code>
Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.
If there isn’t a group near you, start one and help create a community that brings people together.
Request a New Group