โ03-01-2022 04:13 AM
Hello,
here is a small code-snippet:
from pyspark.sql import SparkSession
spark = SparkSession.builder.appName('example_app').getOrCreate()
spark.sql('SHOW PARTITIONS database.table').show()
The output inside the Databricks-Notebook:
+-------------+-------+--------------------+
|projectNumber|plantId| name|
+-------------+-------+--------------------+
| xxxx| P0|***.yyyy............|
| yyyy| P2|***.yyyy............|
...
When I run the same code as above in Visual Studio Code, connected to the same cluster through Databricks-Connect, I receive this output:
+---------+
|partition|
+---------+
| xxxx|
| yyyy|
...
This output has the wrong column name and shows only the first partition.
This is strange. Everything is identical so the output should be the same.
I receive the correct partitions through sql-describe in both databricks-connect and databricks:
spark.sql('describe table database.table').show()
+--------------+-------------+-------+
| col_name| data_type|comment|
+--------------+-------------+-------+
|# Partitioning| | |
| Part 0|projectNumber| |
| Part 1| plantId| |
| Part 2| name| |
+--------------+-------------+-------+
The table is a delta-table, located in an azure blob-storage.
I tried to refresh the table but this makes no difference.
I found a difference in the Spark-UI SQL tab.
There are 3 queries for the db-connect run and 4 for the databricks run.
The physical execution plan is identical but the second query "Execute ShowPartitionsDeltaCommand" is missing in the db-connect run.
Queries for db-connect:
Queries for databricks:
I donยดt know why and how but the 2 partitions get lost with the db-connect query.
Any ideas?
โ03-15-2022 03:08 AM
Hi @Stefan Plankโ
There seems some issue with databricks connect and SQL queries. Could you please try SQL connectors?
more info: https://docs.databricks.com/dev-tools/python-sql-connector.html ?
It is usually recommended to use an SQL connector if you are using Python development with SQL queries.
more info: https://docs.databricks.com/dev-tools/databricks-connect.html#overview
Let me if this works for you.
โ03-02-2022 12:38 AM
docs say the sql api is supported for delta lake, so I would assume they return the same results.
But clearly that is not the case.
What version of db-connect do you use?
โ03-02-2022 02:53 AM
db-connect version 9.1.9
cluster db-runtime 9.1 LTS
Python 3.8.10
โ03-15-2022 03:08 AM
Hi @Stefan Plankโ
There seems some issue with databricks connect and SQL queries. Could you please try SQL connectors?
more info: https://docs.databricks.com/dev-tools/python-sql-connector.html ?
It is usually recommended to use an SQL connector if you are using Python development with SQL queries.
more info: https://docs.databricks.com/dev-tools/databricks-connect.html#overview
Let me if this works for you.
โ04-05-2022 04:20 PM
Hi @Stefan Plankโ ,
Just checking if you still need help. Did @Gaurav Rupnarโ recommendation help you to resolve your issue?
โ04-05-2022 11:16 PM
Hi @Jose Gonzalezโ ,
yes the SQL-Connector works fine. Thank you!
โ04-11-2022 11:29 AM
Hi @Stefan Plankโ ,
Thank you for your reply, I will mark the response a "best".
Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections.
Click here to register and join today!
Engage in exciting technical discussions, join a group with your peers and meet our Featured Members.