cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

Databricks-Connect shows different partitions than Databricks for the same delta table

s_plank
New Contributor III

Hello,

here is a small code-snippet:

from pyspark.sql import SparkSession
spark = SparkSession.builder.appName('example_app').getOrCreate()
 
spark.sql('SHOW PARTITIONS database.table').show()

The output inside the Databricks-Notebook:

+-------------+-------+--------------------+
|projectNumber|plantId|                name|
+-------------+-------+--------------------+
|         xxxx|     P0|***.yyyy............|
|         yyyy|     P2|***.yyyy............|
...

When I run the same code as above in Visual Studio Code, connected to the same cluster through Databricks-Connect, I receive this output:

+---------+
|partition|
+---------+
|     xxxx|
|     yyyy|
...

This output has the wrong column name and shows only the first partition.

This is strange. Everything is identical so the output should be the same.

I receive the correct partitions through sql-describe in both databricks-connect and databricks:

spark.sql('describe table database.table').show()
 
+--------------+-------------+-------+
|      col_name|    data_type|comment|
+--------------+-------------+-------+
|# Partitioning|             |       |
|        Part 0|projectNumber|       |
|        Part 1|      plantId|       |
|        Part 2|         name|       |
+--------------+-------------+-------+

The table is a delta-table, located in an azure blob-storage.

I tried to refresh the table but this makes no difference.

I found a difference in the Spark-UI SQL tab.

There are 3 queries for the db-connect run and 4 for the databricks run.

The physical execution plan is identical but the second query "Execute ShowPartitionsDeltaCommand" is missing in the db-connect run.

Queries for db-connect:

  • Execute ShowPartitionsDeltaCommand | Output: [projectNumber, plantId, name]
  • bigger execution plan (identical in both cases) | Output: [projectNumber, plantId, name]
  • LocalTableScan | Output: [partition]

Queries for databricks:

  • Execute ShowPartitionsDeltaCommand | Output: [projectNumber, plantId, name]
  • bigger execution plan (identical in both cases) | Output: [projectNumber, plantId, name]
  • Execute ShowPartitionsDeltaCommand | Output: [projectNumber, plantId, name]
  • LocalTableScan | Output: [projectNumber, plantId, name]

I donยดt know why and how but the 2 partitions get lost with the db-connect query.

Any ideas?

1 ACCEPTED SOLUTION

Accepted Solutions

User16763506477
Contributor III

Hi @Stefan Plankโ€‹ 

There seems some issue with databricks connect and SQL queries. Could you please try SQL connectors?

more info: https://docs.databricks.com/dev-tools/python-sql-connector.html ?

It is usually recommended to use an SQL connector if you are using Python development with SQL queries.

more info: https://docs.databricks.com/dev-tools/databricks-connect.html#overview

Let me if this works for you.

View solution in original post

6 REPLIES 6

-werners-
Esteemed Contributor III

docs say the sql api is supported for delta lake, so I would assume they return the same results.

But clearly that is not the case.

What version of db-connect do you use?

s_plank
New Contributor III

db-connect version 9.1.9

cluster db-runtime 9.1 LTS

Python 3.8.10

User16763506477
Contributor III

Hi @Stefan Plankโ€‹ 

There seems some issue with databricks connect and SQL queries. Could you please try SQL connectors?

more info: https://docs.databricks.com/dev-tools/python-sql-connector.html ?

It is usually recommended to use an SQL connector if you are using Python development with SQL queries.

more info: https://docs.databricks.com/dev-tools/databricks-connect.html#overview

Let me if this works for you.

jose_gonzalez
Moderator
Moderator

Hi @Stefan Plankโ€‹ ,

Just checking if you still need help. Did @Gaurav Rupnarโ€‹ recommendation help you to resolve your issue?

s_plank
New Contributor III

Hi @Jose Gonzalezโ€‹ ,

yes the SQL-Connector works fine. Thank you!

Hi @Stefan Plankโ€‹ ,

Thank you for your reply, I will mark the response a "best".

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you wonโ€™t want to miss the chance to attend and share knowledge.

If there isnโ€™t a group near you, start one and help create a community that brings people together.

Request a New Group