05-25-2023 02:57 AM
I was trying to read some delta data from databricks[Hive metastore] sql endpoint using pyspark, but while doing so I encountered that all the values of the table after fetching are same as the column name.
Even when I try to just show the data it gives me error if the column type is not string.
Error :
org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 13.0 failed 4 times, most recent failure: Lost task 0.3 in stage 13.0 (TID 34) (10.139.64.4 executor driver): java.sql.SQLDataException: [Databricks][JDBC](10140) Error converting value to BigDecimal.
jdbc_url = "jdbc:databricks://XXXX:443/default;transportMode=http;ssl=1;httpPath=xxxx;password=<pat token>"
table_name = "xxxx"
df = spark.read.format("jdbc") \
.option("url", jdbc_url) \
.option("dbtable", table_name) \
.option("driver", "com.databricks.client.jdbc.Driver") \
.load()
df.printSchema()
>>>>>> Output>>>>>>
root
|-- Description: string (nullable = true)
|-- Volume: double (nullable = true)
df.show()
>>>>>> Output>>>>>>
org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 0.0 failed 4 times, most recent failure: Lost task 0.3 in stage 0.0 (TID 3) (10.139.64.4 executor driver): java.sql.SQLDataException: [Databricks][JDBC](10140) Error converting value to double.
df.select('Description').show(10, False)
>>>>>> Output>>>>>>
+-----------+
|Description|
+-----------+
|Description|
|Description|
|Description|
|Description|
|Description|
|Description|
|Description|
|Description|
|Description|
|Description|
+-----------+
only showing top 10 rows
Note: Everythin works file if i use "sql.connect" and consume the data using "cursor".
But while trying with spark JDBC conn, i am facing this issue. Can someone help me here?
05-25-2023 03:21 AM
have you included the JDBC driver for your particular database on the spark classpath?
example for postgres:
./bin/spark-shell --driver-class-path postgresql-9.4.1207.jar --jars postgresql-9.4.1207.jar
https://spark.apache.org/docs/latest/sql-data-sources-jdbc.html
PS. I just noticed you want to read from Databricks SQL, not some kind of database.
Can you try with using another JDBC URL (2.6.22 or earlier)?
Also not sure if the driver you use is the correct one. Local I use com.simba.spark.jdbc.Driver
Or download the JDBC driver and add it to the spark classpath.
05-25-2023 03:55 AM
Yes, i have added them and have also tried with older version of JDBC. But the result is same.
Also can you provide me with com.simba.spark.jdbc.Driver link to download and use the same to test.
05-25-2023 09:21 PM
Hi @Swostiman Mohapatra ,
Use below code to access the data by using JDBC-
pip install databricks-sql-connector
from databricks import sql
import os
conn = sql.connect(
server_hostname = "<Host_name>",
http_path = "<Path>",
access_token = "<Access_token>")
cursor = conn.cursor()
cursor.execute("SELECT * from P123")
display(cursor.fetchall())
cursor.close()
conn.close()
05-28-2023 05:29 PM
Hi @Swostiman Mohapatra
Thank you for posting your question in our community! We are happy to assist you.
To help us provide you with the most accurate information, could you please take a moment to review the responses and select the one that best answers your question?
This will also help other community members who may have similar questions in the future. Thank you for your participation and let us know if you need any further assistance!
08-11-2023 05:05 PM
Encountered the same issue and downgrading to 2.6.22 helped me resolve this issue.
Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.
If there isn’t a group near you, start one and help create a community that brings people together.
Request a New Group