cancel
Showing results for 
Search instead for 
Did you mean: 
Get Started Discussions
Start your journey with Databricks by joining discussions on getting started guides, tutorials, and introductory topics. Connect with beginners and experts alike to kickstart your Databricks experience.
cancel
Showing results for 
Search instead for 
Did you mean: 

Use Python notebook to read data from Databricks

cony2025
New Contributor II

I'm very new to Databricks. I hope this is the right place to ask this question.

I want to use PySpark in a notebook to read data from a Databricks database with the below codes.

 

databricks_host = "adb-xxxx.azuredatabricks.net"
http_path = "/sql/1.0/warehouses/xxxx"
access_token = "dapixxxx"
jdbc_url = f"jdbc:databricks://{databricks_host}:443/default;transportMode=http;ssl=1;httpPath={http_path};AuthMech=3"

query = "(SELECT DISTINCT building_code, city FROM wphub_poc.gold.v_d_building) as subquery"

df = (spark.read.format("jdbc")
    .option("url", jdbc_url)
    .option("query", query)
    .option("user", "token")
    .option("password", access_token)
    .option("driver", "com.databricks.client.jdbc.Driver")
    .option("fetchsize", "10000")
    .load())

print(df.printSchema())
print(df.show())

 

However, the df result only contains the litteral texts from the SELECT query, like this. Do I do anything wrong and how to fix it? Thank you!

Screenshot 2025-03-04 225915.png

 

Screenshot 2025-03-04 211512.png

4 REPLIES 4

Advika_
Databricks Employee
Databricks Employee

Hello, @cony2025!

It seems the issue is due to the query parameter. When using .option("query"), Spark automatically wraps your SQL inside an outer SELECT * FROM ( ), which leads to invalid nested subqueries if your original query already includes an alias.

To avoid this, try using .option("dbtable") instead. It prevents the automatic wrapping and should resolve the issue.

Let me know if this doesn't help!

cony2025
New Contributor II

Hi @Advika_ 

Thanks for your answer. I modified to the below according to your recommendation. Howerver, I still received column names, instead of the data, as the result. 

 

df = (spark.read.format("jdbc")
    .option("url", jdbc_url)
    .option("dbtable", query)
    .option("user", "token")
    .option("password", access_token)
    .option("driver", "com.databricks.client.jdbc.Driver")
    .option("fetchsize", "10000")
    .load())

 

I saw a solution recommending to import and register the dbsqlDialectClass to the JAR file, but I can't figure out how to do it properly. I appreciate any guidance! 

Hi @cony2025 Could you load the whole table as TempView and see if what happens

df = (spark.read.format("jdbc")
    .option("url", jdbc_url)
    .option("dbtable", "wphub_poc.gold.v_d_building")
    .option("user", "token")
    .option("password", access_token)
    .option("driver", "com.databricks.client.jdbc.Driver")
    .option("fetchsize", "10000")
    .load())

df.createOrReplaceTempView("building_data")

spark.sql("SELECT DISTINCT building_code, city FROM building_data").show(truncate=False)

 

dna1
New Contributor II

I would try changing the query to something like the following, it should return the column names in the table so you can see if the jdbc call is actually returning the data correctly

SELECT * FROM wphub_poc.gold.v_d_building limit 10

 

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now