โ03-01-2022 04:13 AM
Hello,
here is a small code-snippet:
from pyspark.sql import SparkSession
spark = SparkSession.builder.appName('example_app').getOrCreate()
spark.sql('SHOW PARTITIONS database.table').show()
The output inside the Databricks-Notebook:
+-------------+-------+--------------------+
|projectNumber|plantId| name|
+-------------+-------+--------------------+
| xxxx| P0|***.yyyy............|
| yyyy| P2|***.yyyy............|
...
When I run the same code as above in Visual Studio Code, connected to the same cluster through Databricks-Connect, I receive this output:
+---------+
|partition|
+---------+
| xxxx|
| yyyy|
...
This output has the wrong column name and shows only the first partition.
This is strange. Everything is identical so the output should be the same.
I receive the correct partitions through sql-describe in both databricks-connect and databricks:
spark.sql('describe table database.table').show()
+--------------+-------------+-------+
| col_name| data_type|comment|
+--------------+-------------+-------+
|# Partitioning| | |
| Part 0|projectNumber| |
| Part 1| plantId| |
| Part 2| name| |
+--------------+-------------+-------+
The table is a delta-table, located in an azure blob-storage.
I tried to refresh the table but this makes no difference.
I found a difference in the Spark-UI SQL tab.
There are 3 queries for the db-connect run and 4 for the databricks run.
The physical execution plan is identical but the second query "Execute ShowPartitionsDeltaCommand" is missing in the db-connect run.
Queries for db-connect:
Queries for databricks:
I donยดt know why and how but the 2 partitions get lost with the db-connect query.
Any ideas?
โ03-15-2022 03:08 AM
Hi @Stefan Plankโ
There seems some issue with databricks connect and SQL queries. Could you please try SQL connectors?
more info: https://docs.databricks.com/dev-tools/python-sql-connector.html ?
It is usually recommended to use an SQL connector if you are using Python development with SQL queries.
more info: https://docs.databricks.com/dev-tools/databricks-connect.html#overview
Let me if this works for you.
โ03-02-2022 12:38 AM
docs say the sql api is supported for delta lake, so I would assume they return the same results.
But clearly that is not the case.
What version of db-connect do you use?
โ03-02-2022 02:53 AM
db-connect version 9.1.9
cluster db-runtime 9.1 LTS
Python 3.8.10
โ03-15-2022 03:08 AM
Hi @Stefan Plankโ
There seems some issue with databricks connect and SQL queries. Could you please try SQL connectors?
more info: https://docs.databricks.com/dev-tools/python-sql-connector.html ?
It is usually recommended to use an SQL connector if you are using Python development with SQL queries.
more info: https://docs.databricks.com/dev-tools/databricks-connect.html#overview
Let me if this works for you.
โ04-05-2022 04:20 PM
Hi @Stefan Plankโ ,
Just checking if you still need help. Did @Gaurav Rupnarโ recommendation help you to resolve your issue?
โ04-05-2022 11:16 PM
Hi @Jose Gonzalezโ ,
yes the SQL-Connector works fine. Thank you!
โ04-11-2022 11:29 AM
Hi @Stefan Plankโ ,
Thank you for your reply, I will mark the response a "best".
Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you wonโt want to miss the chance to attend and share knowledge.
If there isnโt a group near you, start one and help create a community that brings people together.
Request a New Group