cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Reading a table from a catalog that is in a different/external workspace

addy
New Contributor III

I am trying to read a table that is hosted on a different workspace. We have been told to establish a connection to said workspace using a table and consume the table.

Code I am using is

from databricks import sql

connection = sql.connect(
server_hostname="adb-123.azuredatabricks.net",
http_path="/sql/1.0/warehouses/hahaha",
access_token="pass"
)

cursor = connection.cursor()

cursor.execute("SELECT * FROM table")

# Fetch all rows into a list
rows = cursor.fetchall()

# Create a PySpark DataFrame from the list of rows
df = spark.createDataFrame(rows, schema=["A", "B", "C"])

# Close the cursor and connection when done
cursor.close()
connection.close()

Is there a better way of doing this? Is there a way to directly run a spark read and get the data into a pyspark dataframe? This method doesn't appear to be that efficient. Please note that we have to go with the token here and directly consume the table.


2 REPLIES 2

AlliaKhosla
Databricks Employee
Databricks Employee

 Hi Addy

Greetings!

You can also use Delta sharing to share the data across multiple workspaces. Since you want to read tables from another workspace you can use databricks to databricks delta sharing.

https://docs.databricks.com/en/data-sharing/read-data-databricks.html

https://docs.databricks.com/en/data-sharing/index.html

 

addy
New Contributor III

Hi Allia,

Thanks for the helpful links. Unfortunately the client I am working with has a hard requirement for us to use the token. Even my first instinct was to set up a delta share. But unfortunately that is not the case. I am looking for a more efficient way to read the data but by using the token. If you see my code I am getting all the rows first and then creating a dataframe on it. I am looking for a more efficient way to do that, maybe ingest directly into a dataframe.

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group