cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

Quick way to know delta table is zordered

jstatic
New Contributor II

Hello,

I created a delta table table using SQL and specifying the partitioning and zorder strategy. I then loaded data into it for the first time by doing a write as delta with mode of append and save as table. However, I donโ€™t know of a way to verify its zordering and canโ€™t see anything notable in the delta table history. Is there a quick command that will confirm an existing table is zordered and how it is zordered? Adding some more information:

What I'm looking for is to know the state of any version of the table. So if I pulled some current information about the table it should tell me the Table is partitioned by such and such and zordered by such and any other information. So here is what I did:

  1. I created the blank table and defined partitioning and zordering in SQL
CREATE TABLE IF NOT EXISTS mydb.mybronze
  (deviceName STRING, tagName STRING, deviceID STRING, success BOOLEAN, datatype STRING, timestamp TIMESTAMP, value STRING, registerId STRING, description STRING, year INT, month INT, day INT)
USING DELTA
PARTITIONED BY (year, month, day)
LOCATION "abfss://mycontainer@myaccount.dfs.core.windows.net/path1/path2/table1";
 
ALTER TABLE mydb.mybronze
SET TBLPROPERTIES (delta.autoOptimize.optimizeWrite = true, delta.autoOptimize.autoCompact = true);
 
OPTIMIZE mydb.mybronze
ZORDER BY deviceName, tagName;

Then I loaded into the table for the first time in Python:

df.write.format("delta").mode("append").saveAsTable(f"mydb.mybronze")

So how do I know it successfully zordered the data? Or do I need to run an optimize command with zordering after the first load? I'm not seeing this information in the history after an initial load.

1 ACCEPTED SOLUTION

Accepted Solutions

User16763506477
Contributor III

If there is no data then lines 10 and 11 will not have any impact. I am assuming that line (1-5) is creating an empty table but the actual load is happening when you do df.write operation.

Also delta.autoOptimize.autoCompact will not trigger the z-ordering please refer below doc.

https://docs.databricks.com/delta/optimizations/auto-optimize.html#does-auto-optimize-z-order-files

View solution in original post

5 REPLIES 5

-werners-
Esteemed Contributor III

it should show in the table history though as an OPTIMIZE operation.

You get the following parameters:

  • predicate: 
  • zOrderBy: 
  • batchId: 
  • auto: 

jstatic
New Contributor II

So I edited my question a bit to show some code I ran. I was thinking on the first load it should do the zorder and optimize based on the table definition and auto settings. I know that after I explicitly do an optimize (idempotent) and zorder (not idempotent) it would log in the table history. But on the initial load there's nothing telling me if this table is already zordered. It would be nice to have something that can tell me about the table's current state about its zordering for example if it exists. As for now I have no idea after the initial load if any optimization or zordering was performed.

jstatic
New Contributor II

So I edited my question a bit to show some code I ran. I was thinking on the first load it should do the zorder and optimize based on the table definition and auto settings. I know that after I explicitly do an optimize (idempotent) and zorder (not idempotent) it would log in the table history. But on the initial load there's nothing telling me if this table is already zordered. It would be nice to have something that can tell me about the table's current state about its zordering for example if it exists. As for now I have no idea after the initial load if any optimization or zordering was performed.

-werners-
Esteemed Contributor III

OK I get it.

I totally agree btw. For now we do not have that but it would be nice indeed.

How this could be implemented is another question because a table can be ZORDERED weeks or even months ago.

I guess Databricks is working on statistics/info like that.

User16763506477
Contributor III

If there is no data then lines 10 and 11 will not have any impact. I am assuming that line (1-5) is creating an empty table but the actual load is happening when you do df.write operation.

Also delta.autoOptimize.autoCompact will not trigger the z-ordering please refer below doc.

https://docs.databricks.com/delta/optimizations/auto-optimize.html#does-auto-optimize-z-order-files

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you wonโ€™t want to miss the chance to attend and share knowledge.

If there isnโ€™t a group near you, start one and help create a community that brings people together.

Request a New Group