Use JDBC connect to databrick default cluster and read table into pyspark dataframe. All the column turned into same as column name
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
02-22-2023 11:45 AM
I used code like below to Use JDBC connect to databrick default cluster and read table into pyspark dataframe
url = 'jdbc:databricks://[workspace domain]:443/default;transportMode=http;ssl=1;AuthMech=3;httpPath=[path];AuthMech=3;UID=token;PWD=[your_access_token]'
driver_class = 'com.databricks.client.jdbc.Driver'
# create a DataFrame by querying the MySQL database
df_1 = spark.read.format("jdbc")\
.option("url", url)\
.option("driver", driver_class)\
.option("dbtable", 'my_table2')\
.load()
df_1.show()
The final df_1 dataframe become
+----+
|name|
+----+
|name|
|name|
|name|
+----+
the final result should be
+----+
|name|
+----+
|John|
|Jane|
|Bob |
+----+
I also tested with code like
df_2 = spark.sql('select * from my_table2')
df_2.show()
the result dataframe is correct. please advise. thank you!
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
02-22-2023 08:24 PM
Hi @yu zhang
Just try to give your database name also into the code.
I am using same code with database name and it's working fine for me
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
02-27-2023 07:57 AM
does pyspark.read.jdbc function has database name option? could you tell me how to give database name in this function. I think database name included in jdbc url option.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
04-24-2023 09:02 PM
@yu zhang :
It looks like the issue with the first code snippet you provided is that it is not specifying the correct query to retrieve the data from your database.
When using the load() method with the jdbc data source, you need to provide a SQL query in the
dbtable option that retrieves the data you want to load into the DataFrame.
Assuming that your my_table2 table contains a column named name with the values you listed, you can modify your code as follows to retrieve the correct data:
url = 'jdbc:databricks://[workspace domain]:443/default;transportMode=http;ssl=1;AuthMech=3;httpPath=[path];AuthMech=3;UID=token;PWD=[your_access_token]'
driver_class = 'com.databricks.client.jdbc.Driver'
# create a DataFrame by querying the MySQL database
df_1 = spark.read.format("jdbc")\
.option("url", url)\
.option("driver", driver_class)\
.option("dbtable", '(select name from my_table2) as tmp')\
.load()
df_1.show()
This code passes a SQL query to the dbtable option that selects only the name column from your
my_table2 table. The resulting DataFrame should contain the data you expect.
Alternatively, you can use the spark.sql method to directly execute a SQL query against your database and load the results into a DataFrame, as you showed in your second code snippet:
df_2 = spark.sql('select * from my_table2')
df_2.show()
This code should also retrieve the correct data from your database.

