Databricks Community

Swostiman · ‎05-25-2023

I was trying to read some delta data from databricks[Hive metastore] sql endpoint using pyspark, but while doing so I encountered that all the values of the table after fetching are same as the column name.

Even when I try to just show the data it gives me error if the column type is not string.

Error :

org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 13.0 failed 4 times, most recent failure: Lost task 0.3 in stage 13.0 (TID 34) (10.139.64.4 executor driver): java.sql.SQLDataException: [Databricks][JDBC](10140) Error converting value to BigDecimal.

jdbc_url = "jdbc:databricks://XXXX:443/default;transportMode=http;ssl=1;httpPath=xxxx;password=<pat token>"
table_name = "xxxx"
 
df = spark.read.format("jdbc") \
     .option("url", jdbc_url) \
     .option("dbtable", table_name) \
     .option("driver", "com.databricks.client.jdbc.Driver") \
     .load()
 
df.printSchema()
>>>>>> Output>>>>>>
root
 |-- Description: string (nullable = true)
 |-- Volume: double (nullable = true)
 
 
df.show()
>>>>>> Output>>>>>>
org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 0.0 failed 4 times, most recent failure: Lost task 0.3 in stage 0.0 (TID 3) (10.139.64.4 executor driver): java.sql.SQLDataException: [Databricks][JDBC](10140) Error converting value to double.
 
 
df.select('Description').show(10, False)
>>>>>> Output>>>>>>
+-----------+
|Description|
+-----------+
|Description|
|Description|
|Description|
|Description|
|Description|
|Description|
|Description|
|Description|
|Description|
|Description|
+-----------+
only showing top 10 rows

Note: Everythin works file if i use "sql.connect" and consume the data using "cursor".

But while trying with spark JDBC conn, i am facing this issue. Can someone help me here?

-werners- · ‎05-25-2023

have you included the JDBC driver for your particular database on the spark classpath?

example for postgres:

./bin/spark-shell --driver-class-path postgresql-9.4.1207.jar --jars postgresql-9.4.1207.jar

https://spark.apache.org/docs/latest/sql-data-sources-jdbc.html

PS. I just noticed you want to read from Databricks SQL, not some kind of database.

Can you try with using another JDBC URL (2.6.22 or earlier)?

Also not sure if the driver you use is the correct one. Local I use com.simba.spark.jdbc.Driver

Or download the JDBC driver and add it to the spark classpath.

Swostiman · ‎05-25-2023

Yes, i have added them and have also tried with older version of JDBC. But the result is same.

Also can you provide me with com.simba.spark.jdbc.Driver link to download and use the same to test.

Ajay-Pandey · ‎05-25-2023

Hi @Swostiman Mohapatra ,

Use below code to access the data by using JDBC-

pip install databricks-sql-connector
 
from databricks import sql
import os
 
conn = sql.connect( 
                     server_hostname = "<Host_name>",
                     http_path = "<Path>",
                     access_token = "<Access_token>")
 
cursor = conn.cursor()
 
cursor.execute("SELECT * from P123")
display(cursor.fetchall())
 
cursor.close()
conn.close()

Ajay Kumar Pandey

Anonymous · ‎05-28-2023

Hi @Swostiman Mohapatra

Thank you for posting your question in our community! We are happy to assist you.

To help us provide you with the most accurate information, could you please take a moment to review the responses and select the one that best answers your question?

This will also help other community members who may have similar questions in the future. Thank you for your participation and let us know if you need any further assistance!