Databricks Community

alex-syk · ‎08-11-2023

I was recently given a credential file to access shared data via delta sharing. I am following the documentation from https://docs.databricks.com/en/data-sharing/read-data-open.html. The documentation wants the contents of the credential file in a folder in DBFS. I would like to use Azure Key Vault instead.

Therefore, instead of using (under "Step 2: Use a notebook to list and read shared tables" in the above URL):

client = delta_sharing.SharingClient(f"/dbfs/<dbfs-path>/config.share")

client.list_all_tables()

I am using:

credentials = dbutils.secrets.get(scope='redacted', key='redacted')

profile = delta_sharing.protocol.DeltaSharingProfile.from_json(credentials)

client = delta_sharing.SharingClient(profile=profile)

client.list_all_tables()

The above works fine. I can list the tables. Now I would like to load a table using Spark. The documentation suggests using

delta_sharing.load_as_spark(f"<profile-path>#<share-name>.<schema-name>.<table-name>", version=<version-as-of>)

But that relies on having stored the contents of the credential file in a folder in DBFS and using that path for <profile-path>. Is there an alternative way to do this with the "profile" variable I am using? By the way, the code is bold instead of formatted in code blocks because I kept getting errors that prevented me from posting.

Debayan · ‎08-21-2023

Hi, You can create a secret and store the key inside it OR also you can use a local tool to Base64-encode the contents of your JSON key file, create a secret in a Databricks-backed scope and then you can copy & paste the Base64-encoded text into your secret value. After that, you can reference your secret with the following Spark config of your cluster: credentials {{secrets/<scope-name>/<secret-name>}}

Please tag @Debayan with your next comment, which will get me notified. Thanks!

alex-syk · ‎08-21-2023

Hi @Debayan, thanks for your response!

I'm trying to understand your instructions. The content of my credential file is (I've replaced confidential information with "xyz"):
{"shareCredentialsVersion":1, "bearerToken":"xyz", "endpoint":"xyz", "expirationTime":"2023-09-10T04:10:49.277Z"}

I put that content in a secret in a Databricks-backed scope, and can access it:
credentials = dbutils.secrets.get(scope='redacted', key='redacted')
profile = delta_sharing.protocol.DeltaSharingProfile.from_json(credentials)

Now, instead of doing
delta_sharing.load_as_spark(f"<profile-path>#<share-name>.<schema-name>.<table-name>", version=<version-as-of>)
as suggested in the documentation, I was hoping to use my profile variable that I created and use that in place of <profile-path>. Is that possible? I was thinking there has to be a way because the profile variable has the same information in the share.config file. That is,

print(profile)

DeltaSharingProfile(share_credentials_version=1, endpoint='xyz', bearer_token='xyz', expiration_time='2023-09-10T04:10:49.277Z', type=None, token_endpoint=None, client_id=None, client_secret=None, username=None, password=None)

Debayan · ‎08-21-2023

Hi, the most feasible way would be to convert the contents of your key file into base64 and only mention the spark config as below:

credentials <base 64 encoded code>

alex-syk · ‎08-22-2023

Hi @Debayan, do you have have some example code you can share?

Debayan · ‎08-23-2023

Hi, there is no code as such, only in the spark config you have to mention the syntax with the spark config as below:

credentials <base 64 encoded code>

alex-syk · ‎08-23-2023

Hi @Debayan, how do I mention the syntax with the spark config?

Debayan · ‎08-23-2023

Hi, you can mention something like below with the other spark configs, such as:

spark.hadoop.google.cloud.auth.service.account.enable true
spark.hadoop.fs.gs.auth.service.account.email <client-email>
spark.hadoop.fs.gs.project.id <project-id>
credentials <base 64 encoded code>

Databricks Community

Delta Sharing - Alternative to config.share

Connect with Databricks Users in Your Area

Databricks Learning Festival (Virtual): 15 January - 31 January 2025

Milestone: DatabricksTV Reaches 100 Videos!

Announcing the new Meta Llama 3.3 model on Databricks

Databricks Community Champion - December 2024 - Sujesh Menon

Dotmatics and Databricks Partner to Advance Scientific Intelligence in Life Sciences