cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Data Engineering
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

Databricks-Connect shows different partitions than Databricks for the same delta table

s_plank
New Contributor III

Hello,

here is a small code-snippet:

from pyspark.sql import SparkSession
spark = SparkSession.builder.appName('example_app').getOrCreate()
 
spark.sql('SHOW PARTITIONS database.table').show()

The output inside the Databricks-Notebook:

+-------------+-------+--------------------+
|projectNumber|plantId|                name|
+-------------+-------+--------------------+
|         xxxx|     P0|***.yyyy............|
|         yyyy|     P2|***.yyyy............|
...

When I run the same code as above in Visual Studio Code, connected to the same cluster through Databricks-Connect, I receive this output:

+---------+
|partition|
+---------+
|     xxxx|
|     yyyy|
...

This output has the wrong column name and shows only the first partition.

This is strange. Everything is identical so the output should be the same.

I receive the correct partitions through sql-describe in both databricks-connect and databricks:

spark.sql('describe table database.table').show()
 
+--------------+-------------+-------+
|      col_name|    data_type|comment|
+--------------+-------------+-------+
|# Partitioning|             |       |
|        Part 0|projectNumber|       |
|        Part 1|      plantId|       |
|        Part 2|         name|       |
+--------------+-------------+-------+

The table is a delta-table, located in an azure blob-storage.

I tried to refresh the table but this makes no difference.

I found a difference in the Spark-UI SQL tab.

There are 3 queries for the db-connect run and 4 for the databricks run.

The physical execution plan is identical but the second query "Execute ShowPartitionsDeltaCommand" is missing in the db-connect run.

Queries for db-connect:

  • Execute ShowPartitionsDeltaCommand | Output: [projectNumber, plantId, name]
  • bigger execution plan (identical in both cases) | Output: [projectNumber, plantId, name]
  • LocalTableScan | Output: [partition]

Queries for databricks:

  • Execute ShowPartitionsDeltaCommand | Output: [projectNumber, plantId, name]
  • bigger execution plan (identical in both cases) | Output: [projectNumber, plantId, name]
  • Execute ShowPartitionsDeltaCommand | Output: [projectNumber, plantId, name]
  • LocalTableScan | Output: [projectNumber, plantId, name]

I donยดt know why and how but the 2 partitions get lost with the db-connect query.

Any ideas?

1 ACCEPTED SOLUTION

Accepted Solutions

User16763506477
Contributor III

Hi @Stefan Plankโ€‹ 

There seems some issue with databricks connect and SQL queries. Could you please try SQL connectors?

more info: https://docs.databricks.com/dev-tools/python-sql-connector.html ?

It is usually recommended to use an SQL connector if you are using Python development with SQL queries.

more info: https://docs.databricks.com/dev-tools/databricks-connect.html#overview

Let me if this works for you.

View solution in original post

6 REPLIES 6

-werners-
Esteemed Contributor III

docs say the sql api is supported for delta lake, so I would assume they return the same results.

But clearly that is not the case.

What version of db-connect do you use?

s_plank
New Contributor III

db-connect version 9.1.9

cluster db-runtime 9.1 LTS

Python 3.8.10

User16763506477
Contributor III

Hi @Stefan Plankโ€‹ 

There seems some issue with databricks connect and SQL queries. Could you please try SQL connectors?

more info: https://docs.databricks.com/dev-tools/python-sql-connector.html ?

It is usually recommended to use an SQL connector if you are using Python development with SQL queries.

more info: https://docs.databricks.com/dev-tools/databricks-connect.html#overview

Let me if this works for you.

jose_gonzalez
Moderator
Moderator

Hi @Stefan Plankโ€‹ ,

Just checking if you still need help. Did @Gaurav Rupnarโ€‹ recommendation help you to resolve your issue?

s_plank
New Contributor III

Hi @Jose Gonzalezโ€‹ ,

yes the SQL-Connector works fine. Thank you!

Hi @Stefan Plankโ€‹ ,

Thank you for your reply, I will mark the response a "best".

Welcome to Databricks Community: Lets learn, network and celebrate together

Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections. 

Click here to register and join today! 

Engage in exciting technical discussions, join a group with your peers and meet our Featured Members.