03-01-2022 04:13 AM
Hello,
here is a small code-snippet:
from pyspark.sql import SparkSession
spark = SparkSession.builder.appName('example_app').getOrCreate()
spark.sql('SHOW PARTITIONS database.table').show()
The output inside the Databricks-Notebook:
+-------------+-------+--------------------+
|projectNumber|plantId| name|
+-------------+-------+--------------------+
| xxxx| P0|***.yyyy............|
| yyyy| P2|***.yyyy............|
...
When I run the same code as above in Visual Studio Code, connected to the same cluster through Databricks-Connect, I receive this output:
+---------+
|partition|
+---------+
| xxxx|
| yyyy|
...
This output has the wrong column name and shows only the first partition.
This is strange. Everything is identical so the output should be the same.
I receive the correct partitions through sql-describe in both databricks-connect and databricks:
spark.sql('describe table database.table').show()
+--------------+-------------+-------+
| col_name| data_type|comment|
+--------------+-------------+-------+
|# Partitioning| | |
| Part 0|projectNumber| |
| Part 1| plantId| |
| Part 2| name| |
+--------------+-------------+-------+
The table is a delta-table, located in an azure blob-storage.
I tried to refresh the table but this makes no difference.
I found a difference in the Spark-UI SQL tab.
There are 3 queries for the db-connect run and 4 for the databricks run.
The physical execution plan is identical but the second query "Execute ShowPartitionsDeltaCommand" is missing in the db-connect run.
Queries for db-connect:
Queries for databricks:
I don´t know why and how but the 2 partitions get lost with the db-connect query.
Any ideas?
03-15-2022 03:08 AM
Hi @Stefan Plank
There seems some issue with databricks connect and SQL queries. Could you please try SQL connectors?
more info: https://docs.databricks.com/dev-tools/python-sql-connector.html ?
It is usually recommended to use an SQL connector if you are using Python development with SQL queries.
more info: https://docs.databricks.com/dev-tools/databricks-connect.html#overview
Let me if this works for you.
03-02-2022 12:38 AM
docs say the sql api is supported for delta lake, so I would assume they return the same results.
But clearly that is not the case.
What version of db-connect do you use?
03-02-2022 02:53 AM
db-connect version 9.1.9
cluster db-runtime 9.1 LTS
Python 3.8.10
03-15-2022 03:08 AM
Hi @Stefan Plank
There seems some issue with databricks connect and SQL queries. Could you please try SQL connectors?
more info: https://docs.databricks.com/dev-tools/python-sql-connector.html ?
It is usually recommended to use an SQL connector if you are using Python development with SQL queries.
more info: https://docs.databricks.com/dev-tools/databricks-connect.html#overview
Let me if this works for you.
04-05-2022 04:20 PM
Hi @Stefan Plank ,
Just checking if you still need help. Did @Gaurav Rupnar recommendation help you to resolve your issue?
04-05-2022 11:16 PM
Hi @Jose Gonzalez ,
yes the SQL-Connector works fine. Thank you!
04-11-2022 11:29 AM
Hi @Stefan Plank ,
Thank you for your reply, I will mark the response a "best".
Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.
If there isn’t a group near you, start one and help create a community that brings people together.
Request a New Group