topic Re: Use Python notebook to read data from Databricks in Get Started Discussions

Use Python notebook to read data from Databricks

cony2025 — Wed, 05 Mar 2025 07:11:57 GMT

I'm very new to Databricks. I hope this is the right place to ask this question.

I want to use PySpark in a notebook to read data from a Databricks database with the below codes.

databricks_host = "adb-xxxx.azuredatabricks.net" http_path = "/sql/1.0/warehouses/xxxx" access_token = "dapixxxx" jdbc_url = f"jdbc:databricks://{databricks_host}:443/default;transportMode=http;ssl=1;httpPath={http_path};AuthMech=3" query = "(SELECT DISTINCT building_code, city FROM wphub_poc.gold.v_d_building) as subquery" df = (spark.read.format("jdbc") .option("url", jdbc_url) .option("query", query) .option("user", "token") .option("password", access_token) .option("driver", "com.databricks.client.jdbc.Driver") .option("fetchsize", "10000") .load()) print(df.printSchema()) print(df.show())

However, the df result only contains the litteral texts from the SELECT query, like this. Do I do anything wrong and how to fix it? Thank you!

Re: Use Python notebook to read data from Databricks

Advika_ — Fri, 07 Mar 2025 11:29:08 GMT

Hello, @cony2025!

It seems the issue is due to the query parameter. When using .option("query"), Spark automatically wraps your SQL inside an outer SELECT * FROM ( ), which leads to invalid nested subqueries if your original query already includes an alias.

To avoid this, try using .option("dbtable") instead. It prevents the automatic wrapping and should resolve the issue.

Let me know if this doesn't help!

Re: Use Python notebook to read data from Databricks

cony2025 — Fri, 14 Mar 2025 04:24:28 GMT

Hi @Advika_

Thanks for your answer. I modified to the below according to your recommendation. Howerver, I still received column names, instead of the data, as the result.

df = (spark.read.format("jdbc") .option("url", jdbc_url) .option("dbtable", query) .option("user", "token") .option("password", access_token) .option("driver", "com.databricks.client.jdbc.Driver") .option("fetchsize", "10000") .load())

I saw a solution recommending to import and register the dbsqlDialectClass to the JAR file, but I can't figure out how to do it properly. I appreciate any guidance!

Re: Use Python notebook to read data from Databricks

Akash_Varuna — Fri, 14 Mar 2025 09:20:53 GMT

Hi @cony2025 Could you load the whole table as TempView and see if what happens

df = (spark.read.format("jdbc") .option("url", jdbc_url) .option("dbtable", "wphub_poc.gold.v_d_building") .option("user", "token") .option("password", access_token) .option("driver", "com.databricks.client.jdbc.Driver") .option("fetchsize", "10000") .load()) df.createOrReplaceTempView("building_data") spark.sql("SELECT DISTINCT building_code, city FROM building_data").show(truncate=False)

Re: Use Python notebook to read data from Databricks

dna1 — Fri, 14 Mar 2025 10:21:30 GMT

I would try changing the query to something like the following, it should return the column names in the table so you can see if the jdbc call is actually returning the data correctly

SELECT * FROM wphub_poc.gold.v_d_building limit 10