02-16-2022 05:13 AM
Hello,
I created a delta table table using SQL and specifying the partitioning and zorder strategy. I then loaded data into it for the first time by doing a write as delta with mode of append and save as table. However, I don’t know of a way to verify its zordering and can’t see anything notable in the delta table history. Is there a quick command that will confirm an existing table is zordered and how it is zordered? Adding some more information:
What I'm looking for is to know the state of any version of the table. So if I pulled some current information about the table it should tell me the Table is partitioned by such and such and zordered by such and any other information. So here is what I did:
CREATE TABLE IF NOT EXISTS mydb.mybronze
(deviceName STRING, tagName STRING, deviceID STRING, success BOOLEAN, datatype STRING, timestamp TIMESTAMP, value STRING, registerId STRING, description STRING, year INT, month INT, day INT)
USING DELTA
PARTITIONED BY (year, month, day)
LOCATION "abfss://mycontainer@myaccount.dfs.core.windows.net/path1/path2/table1";
ALTER TABLE mydb.mybronze
SET TBLPROPERTIES (delta.autoOptimize.optimizeWrite = true, delta.autoOptimize.autoCompact = true);
OPTIMIZE mydb.mybronze
ZORDER BY deviceName, tagName;
Then I loaded into the table for the first time in Python:
df.write.format("delta").mode("append").saveAsTable(f"mydb.mybronze")
So how do I know it successfully zordered the data? Or do I need to run an optimize command with zordering after the first load? I'm not seeing this information in the history after an initial load.
03-14-2022 10:48 PM
If there is no data then lines 10 and 11 will not have any impact. I am assuming that line (1-5) is creating an empty table but the actual load is happening when you do df.write operation.
Also delta.autoOptimize.autoCompact will not trigger the z-ordering please refer below doc.
https://docs.databricks.com/delta/optimizations/auto-optimize.html#does-auto-optimize-z-order-files
02-16-2022 07:25 AM
it should show in the table history though as an OPTIMIZE operation.
You get the following parameters:
02-20-2022 10:52 AM
So I edited my question a bit to show some code I ran. I was thinking on the first load it should do the zorder and optimize based on the table definition and auto settings. I know that after I explicitly do an optimize (idempotent) and zorder (not idempotent) it would log in the table history. But on the initial load there's nothing telling me if this table is already zordered. It would be nice to have something that can tell me about the table's current state about its zordering for example if it exists. As for now I have no idea after the initial load if any optimization or zordering was performed.
02-20-2022 10:52 AM
So I edited my question a bit to show some code I ran. I was thinking on the first load it should do the zorder and optimize based on the table definition and auto settings. I know that after I explicitly do an optimize (idempotent) and zorder (not idempotent) it would log in the table history. But on the initial load there's nothing telling me if this table is already zordered. It would be nice to have something that can tell me about the table's current state about its zordering for example if it exists. As for now I have no idea after the initial load if any optimization or zordering was performed.
02-21-2022 01:47 AM
OK I get it.
I totally agree btw. For now we do not have that but it would be nice indeed.
How this could be implemented is another question because a table can be ZORDERED weeks or even months ago.
I guess Databricks is working on statistics/info like that.
03-14-2022 10:48 PM
If there is no data then lines 10 and 11 will not have any impact. I am assuming that line (1-5) is creating an empty table but the actual load is happening when you do df.write operation.
Also delta.autoOptimize.autoCompact will not trigger the z-ordering please refer below doc.
https://docs.databricks.com/delta/optimizations/auto-optimize.html#does-auto-optimize-z-order-files
Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.
If there isn’t a group near you, start one and help create a community that brings people together.
Request a New Group