<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Consuming data from databricks[Hive metastore] sql endpoint using pyspark in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/consuming-data-from-databricks-hive-metastore-sql-endpoint-using/m-p/4001#M862</link>
    <description>&lt;P&gt;have you included the JDBC driver for your particular database on the spark classpath?&lt;/P&gt;&lt;P&gt;example for postgres:&lt;/P&gt;&lt;P&gt;&lt;I&gt;./bin/spark-shell --driver-class-path postgresql-9.4.1207.jar --jars postgresql-9.4.1207.jar&lt;/I&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;A href="https://spark.apache.org/docs/latest/sql-data-sources-jdbc.html" alt="https://spark.apache.org/docs/latest/sql-data-sources-jdbc.html" target="_blank"&gt;https://spark.apache.org/docs/latest/sql-data-sources-jdbc.html&lt;/A&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;PS. I just noticed you want to read from Databricks SQL, not some kind of database.&lt;/P&gt;&lt;P&gt;Can you try with using another JDBC URL (2.6.22 or earlier)?&lt;/P&gt;&lt;P&gt;Also not sure if the driver you use is the correct one. Local I use com.simba.spark.jdbc.Driver&lt;/P&gt;&lt;P&gt;Or download the JDBC driver and add it to the spark classpath.&lt;/P&gt;</description>
    <pubDate>Thu, 25 May 2023 10:21:06 GMT</pubDate>
    <dc:creator>-werners-</dc:creator>
    <dc:date>2023-05-25T10:21:06Z</dc:date>
    <item>
      <title>Consuming data from databricks[Hive metastore] sql endpoint using pyspark</title>
      <link>https://community.databricks.com/t5/data-engineering/consuming-data-from-databricks-hive-metastore-sql-endpoint-using/m-p/4000#M861</link>
      <description>&lt;P&gt;I was trying to read some delta data from databricks[Hive metastore] sql endpoint using pyspark, but while doing so I encountered that all the values of the table after fetching are same as the column name.&lt;/P&gt;&lt;P&gt;Even when I try to just show the data it gives me error if the column type is not string.&lt;/P&gt;&lt;P&gt;Error :&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 13.0 failed 4 times, most recent failure: Lost task 0.3 in stage 13.0 (TID 34) (10.139.64.4 executor driver): java.sql.SQLDataException: [Databricks][JDBC](10140) Error converting value to BigDecimal.&lt;/P&gt;&lt;PRE&gt;&lt;CODE&gt;jdbc_url = "jdbc:databricks://XXXX:443/default;transportMode=http;ssl=1;httpPath=xxxx;password=&amp;lt;pat token&amp;gt;"
table_name = "xxxx"
&amp;nbsp;
df = spark.read.format("jdbc") \
     .option("url", jdbc_url) \
     .option("dbtable", table_name) \
     .option("driver", "com.databricks.client.jdbc.Driver") \
     .load()
&amp;nbsp;
df.printSchema()
&amp;gt;&amp;gt;&amp;gt;&amp;gt;&amp;gt;&amp;gt; Output&amp;gt;&amp;gt;&amp;gt;&amp;gt;&amp;gt;&amp;gt;
root
 |-- Description: string (nullable = true)
 |-- Volume: double (nullable = true)
&amp;nbsp;
&amp;nbsp;
df.show()
&amp;gt;&amp;gt;&amp;gt;&amp;gt;&amp;gt;&amp;gt; Output&amp;gt;&amp;gt;&amp;gt;&amp;gt;&amp;gt;&amp;gt;
org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 0.0 failed 4 times, most recent failure: Lost task 0.3 in stage 0.0 (TID 3) (10.139.64.4 executor driver): java.sql.SQLDataException: [Databricks][JDBC](10140) Error converting value to double.
&amp;nbsp;
&amp;nbsp;
df.select('Description').show(10, False)
&amp;gt;&amp;gt;&amp;gt;&amp;gt;&amp;gt;&amp;gt; Output&amp;gt;&amp;gt;&amp;gt;&amp;gt;&amp;gt;&amp;gt;
+-----------+
|Description|
+-----------+
|Description|
|Description|
|Description|
|Description|
|Description|
|Description|
|Description|
|Description|
|Description|
|Description|
+-----------+
only showing top 10 rows
&amp;nbsp;&lt;/CODE&gt;&lt;/PRE&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Note: Everythin works file if i use "sql.connect" and consume the data using "cursor".&lt;/P&gt;&lt;P&gt;But while trying with spark JDBC conn, i am facing this issue. Can someone help me here?&lt;/P&gt;</description>
      <pubDate>Thu, 25 May 2023 09:57:43 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/consuming-data-from-databricks-hive-metastore-sql-endpoint-using/m-p/4000#M861</guid>
      <dc:creator>Swostiman</dc:creator>
      <dc:date>2023-05-25T09:57:43Z</dc:date>
    </item>
    <item>
      <title>Re: Consuming data from databricks[Hive metastore] sql endpoint using pyspark</title>
      <link>https://community.databricks.com/t5/data-engineering/consuming-data-from-databricks-hive-metastore-sql-endpoint-using/m-p/4001#M862</link>
      <description>&lt;P&gt;have you included the JDBC driver for your particular database on the spark classpath?&lt;/P&gt;&lt;P&gt;example for postgres:&lt;/P&gt;&lt;P&gt;&lt;I&gt;./bin/spark-shell --driver-class-path postgresql-9.4.1207.jar --jars postgresql-9.4.1207.jar&lt;/I&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;A href="https://spark.apache.org/docs/latest/sql-data-sources-jdbc.html" alt="https://spark.apache.org/docs/latest/sql-data-sources-jdbc.html" target="_blank"&gt;https://spark.apache.org/docs/latest/sql-data-sources-jdbc.html&lt;/A&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;PS. I just noticed you want to read from Databricks SQL, not some kind of database.&lt;/P&gt;&lt;P&gt;Can you try with using another JDBC URL (2.6.22 or earlier)?&lt;/P&gt;&lt;P&gt;Also not sure if the driver you use is the correct one. Local I use com.simba.spark.jdbc.Driver&lt;/P&gt;&lt;P&gt;Or download the JDBC driver and add it to the spark classpath.&lt;/P&gt;</description>
      <pubDate>Thu, 25 May 2023 10:21:06 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/consuming-data-from-databricks-hive-metastore-sql-endpoint-using/m-p/4001#M862</guid>
      <dc:creator>-werners-</dc:creator>
      <dc:date>2023-05-25T10:21:06Z</dc:date>
    </item>
    <item>
      <title>Re: Consuming data from databricks[Hive metastore] sql endpoint using pyspark</title>
      <link>https://community.databricks.com/t5/data-engineering/consuming-data-from-databricks-hive-metastore-sql-endpoint-using/m-p/4002#M863</link>
      <description>&lt;P&gt;Yes, i have added them and have also tried with older version of JDBC. But the result is same.&lt;/P&gt;&lt;P&gt;Also can you provide me with com.simba.spark.jdbc.Driver link to download and use the same to test.&lt;/P&gt;</description>
      <pubDate>Thu, 25 May 2023 10:55:04 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/consuming-data-from-databricks-hive-metastore-sql-endpoint-using/m-p/4002#M863</guid>
      <dc:creator>Swostiman</dc:creator>
      <dc:date>2023-05-25T10:55:04Z</dc:date>
    </item>
    <item>
      <title>Re: Consuming data from databricks[Hive metastore] sql endpoint using pyspark</title>
      <link>https://community.databricks.com/t5/data-engineering/consuming-data-from-databricks-hive-metastore-sql-endpoint-using/m-p/4003#M864</link>
      <description>&lt;P&gt;Hi @Swostiman Mohapatra​&amp;nbsp;,&lt;/P&gt;&lt;P&gt;Use below code to access the data by using JDBC-&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;PRE&gt;&lt;CODE&gt;pip install databricks-sql-connector
&amp;nbsp;
from databricks import sql
import os
&amp;nbsp;
conn = sql.connect( 
                     server_hostname = "&amp;lt;Host_name&amp;gt;",
                     http_path = "&amp;lt;Path&amp;gt;",
                     access_token = "&amp;lt;Access_token&amp;gt;")
&amp;nbsp;
cursor = conn.cursor()
&amp;nbsp;
cursor.execute("SELECT * from P123")
display(cursor.fetchall())
&amp;nbsp;
cursor.close()
conn.close()&lt;/CODE&gt;&lt;/PRE&gt;&lt;P&gt;&lt;/P&gt;</description>
      <pubDate>Fri, 26 May 2023 04:21:36 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/consuming-data-from-databricks-hive-metastore-sql-endpoint-using/m-p/4003#M864</guid>
      <dc:creator>Ajay-Pandey</dc:creator>
      <dc:date>2023-05-26T04:21:36Z</dc:date>
    </item>
    <item>
      <title>Re: Consuming data from databricks[Hive metastore] sql endpoint using pyspark</title>
      <link>https://community.databricks.com/t5/data-engineering/consuming-data-from-databricks-hive-metastore-sql-endpoint-using/m-p/4004#M865</link>
      <description>&lt;P&gt;Hi @Swostiman Mohapatra​&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Thank you for posting your question in our community! We are happy to assist you.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;To help us provide you with the most accurate information, could you please take a moment to review the responses and select the one that best answers your question?&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;This will also help other community members who may have similar questions in the future. Thank you for your participation and let us know if you need any further assistance!&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;</description>
      <pubDate>Mon, 29 May 2023 00:29:45 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/consuming-data-from-databricks-hive-metastore-sql-endpoint-using/m-p/4004#M865</guid>
      <dc:creator>Anonymous</dc:creator>
      <dc:date>2023-05-29T00:29:45Z</dc:date>
    </item>
    <item>
      <title>Re: Consuming data from databricks[Hive metastore] sql endpoint using pyspark</title>
      <link>https://community.databricks.com/t5/data-engineering/consuming-data-from-databricks-hive-metastore-sql-endpoint-using/m-p/39700#M27058</link>
      <description>&lt;P&gt;Encountered the same issue and downgrading to 2.6.22 helped me resolve this issue.&lt;/P&gt;</description>
      <pubDate>Sat, 12 Aug 2023 00:05:04 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/consuming-data-from-databricks-hive-metastore-sql-endpoint-using/m-p/39700#M27058</guid>
      <dc:creator>sucan</dc:creator>
      <dc:date>2023-08-12T00:05:04Z</dc:date>
    </item>
  </channel>
</rss>

