Reading a table from a catalog that is in a different/external workspace
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
03-04-2024 01:33 PM
I am trying to read a table that is hosted on a different workspace. We have been told to establish a connection to said workspace using a table and consume the table.
Code I am using is
from databricks import sql
connection = sql.connect(
server_hostname="adb-123.azuredatabricks.net",
http_path="/sql/1.0/warehouses/hahaha",
access_token="pass"
)
cursor = connection.cursor()
cursor.execute("SELECT * FROM table")
# Fetch all rows into a list
rows = cursor.fetchall()
# Create a PySpark DataFrame from the list of rows
df = spark.createDataFrame(rows, schema=["A", "B", "C"])
# Close the cursor and connection when done
cursor.close()
connection.close()
Is there a better way of doing this? Is there a way to directly run a spark read and get the data into a pyspark dataframe? This method doesn't appear to be that efficient. Please note that we have to go with the token here and directly consume the table.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
03-05-2024 12:33 AM
Hi Addy
Greetings!
You can also use Delta sharing to share the data across multiple workspaces. Since you want to read tables from another workspace you can use databricks to databricks delta sharing.
https://docs.databricks.com/en/data-sharing/read-data-databricks.html
https://docs.databricks.com/en/data-sharing/index.html
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
03-07-2024 08:48 AM
Hi Allia,
Thanks for the helpful links. Unfortunately the client I am working with has a hard requirement for us to use the token. Even my first instinct was to set up a delta share. But unfortunately that is not the case. I am looking for a more efficient way to read the data but by using the token. If you see my code I am getting all the rows first and then creating a dataframe on it. I am looking for a more efficient way to do that, maybe ingest directly into a dataframe.

