cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

Consuming data from databricks[Hive metastore] sql endpoint using pyspark

Swostiman
New Contributor II

I was trying to read some delta data from databricks[Hive metastore] sql endpoint using pyspark, but while doing so I encountered that all the values of the table after fetching are same as the column name.

Even when I try to just show the data it gives me error if the column type is not string.

Error :

org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 13.0 failed 4 times, most recent failure: Lost task 0.3 in stage 13.0 (TID 34) (10.139.64.4 executor driver): java.sql.SQLDataException: [Databricks][JDBC](10140) Error converting value to BigDecimal.

jdbc_url = "jdbc:databricks://XXXX:443/default;transportMode=http;ssl=1;httpPath=xxxx;password=<pat token>"
table_name = "xxxx"
 
df = spark.read.format("jdbc") \
     .option("url", jdbc_url) \
     .option("dbtable", table_name) \
     .option("driver", "com.databricks.client.jdbc.Driver") \
     .load()
 
df.printSchema()
>>>>>> Output>>>>>>
root
 |-- Description: string (nullable = true)
 |-- Volume: double (nullable = true)
 
 
df.show()
>>>>>> Output>>>>>>
org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 0.0 failed 4 times, most recent failure: Lost task 0.3 in stage 0.0 (TID 3) (10.139.64.4 executor driver): java.sql.SQLDataException: [Databricks][JDBC](10140) Error converting value to double.
 
 
df.select('Description').show(10, False)
>>>>>> Output>>>>>>
+-----------+
|Description|
+-----------+
|Description|
|Description|
|Description|
|Description|
|Description|
|Description|
|Description|
|Description|
|Description|
|Description|
+-----------+
only showing top 10 rows
 

Note: Everythin works file if i use "sql.connect" and consume the data using "cursor".

But while trying with spark JDBC conn, i am facing this issue. Can someone help me here?

5 REPLIES 5

-werners-
Esteemed Contributor III

have you included the JDBC driver for your particular database on the spark classpath?

example for postgres:

./bin/spark-shell --driver-class-path postgresql-9.4.1207.jar --jars postgresql-9.4.1207.jar

https://spark.apache.org/docs/latest/sql-data-sources-jdbc.html

PS. I just noticed you want to read from Databricks SQL, not some kind of database.

Can you try with using another JDBC URL (2.6.22 or earlier)?

Also not sure if the driver you use is the correct one. Local I use com.simba.spark.jdbc.Driver

Or download the JDBC driver and add it to the spark classpath.

Swostiman
New Contributor II

Yes, i have added them and have also tried with older version of JDBC. But the result is same.

Also can you provide me with com.simba.spark.jdbc.Driver link to download and use the same to test.

Ajay-Pandey
Esteemed Contributor III

Hi @Swostiman Mohapatraโ€‹ ,

Use below code to access the data by using JDBC-

pip install databricks-sql-connector
 
from databricks import sql
import os
 
conn = sql.connect( 
                     server_hostname = "<Host_name>",
                     http_path = "<Path>",
                     access_token = "<Access_token>")
 
cursor = conn.cursor()
 
cursor.execute("SELECT * from P123")
display(cursor.fetchall())
 
cursor.close()
conn.close()

Ajay Kumar Pandey

Anonymous
Not applicable

Hi @Swostiman Mohapatraโ€‹ 

Thank you for posting your question in our community! We are happy to assist you.

To help us provide you with the most accurate information, could you please take a moment to review the responses and select the one that best answers your question?

This will also help other community members who may have similar questions in the future. Thank you for your participation and let us know if you need any further assistance! 

sucan
New Contributor II

Encountered the same issue and downgrading to 2.6.22 helped me resolve this issue.

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you wonโ€™t want to miss the chance to attend and share knowledge.

If there isnโ€™t a group near you, start one and help create a community that brings people together.

Request a New Group