Databricks Community

addy · ‎03-04-2024

I am trying to read a table that is hosted on a different workspace. We have been told to establish a connection to said workspace using a table and consume the table.

Code I am using is

from databricks import sql

connection = sql.connect(
server_hostname="adb-123.azuredatabricks.net",
http_path="/sql/1.0/warehouses/hahaha",
access_token="pass"
)

cursor = connection.cursor()

cursor.execute("SELECT * FROM table")

# Fetch all rows into a list
rows = cursor.fetchall()

# Create a PySpark DataFrame from the list of rows
df = spark.createDataFrame(rows, schema=["A", "B", "C"])

# Close the cursor and connection when done
cursor.close()
connection.close()

Is there a better way of doing this? Is there a way to directly run a spark read and get the data into a pyspark dataframe? This method doesn't appear to be that efficient. Please note that we have to go with the token here and directly consume the table.

Allia · ‎03-05-2024

Hi Addy

Greetings!

You can also use Delta sharing to share the data across multiple workspaces. Since you want to read tables from another workspace you can use databricks to databricks delta sharing.

https://docs.databricks.com/en/data-sharing/read-data-databricks.html

https://docs.databricks.com/en/data-sharing/index.html

Alliak

addy · ‎03-07-2024

Hi Allia,

Thanks for the helpful links. Unfortunately the client I am working with has a hard requirement for us to use the token. Even my first instinct was to set up a delta share. But unfortunately that is not the case. I am looking for a more efficient way to read the data but by using the token. If you see my code I am getting all the rows first and then creating a dataframe on it. I am looking for a more efficient way to do that, maybe ingest directly into a dataframe.