โ01-09-2024 01:28 PM
Hey everyone,
We're making the move to Unity Catalog from Hive_Metastore and we're running into some issues performing Z-order optimizations on some of our tables.
These tables are, in either place, materialized views created with a "create or refresh live table" statement. They have the same schemas and data.
However, when we try to Z-order these Unity Catalog materialized views, we get the following message:
AnalysisException: <table_name> is a materialized view. 'OPTIMIZE' expects a table.
Given that are Hive_metastore equivalent tables are also materalized views, we're unsure why we only error out on the Unity Catalog ones. Has anyone run into this issue, and if so, how did you get around it?
Thanks in advance!
โ01-19-2024 07:14 AM
For anyone who's reading this later: You can still Z-order your materialized views, but you can't run it as a SQL command. Instead, you can set it as one of the TBLPROPERTIES when you define the table. Here's an example:
create or refresh live table {new_table}
tblproperties (
"quality" = "gold",
"pipelines.autoOptimize.zOrderCols" = "KEY_1,KEY_2"
)
as
select * from {old_table}
This won't Z-order things immediately, but it will do the Z-ordering during the pipeline's maintenance job. It also means that your tables won't be Z-ordered until the maintenance job has run, and to my knowledge you have little control over when it will run. This can be a problem if you're running the pipelines on a daily basis.
โ01-11-2024 09:44 AM
Hi @Jake2 - we cannot run OPTIMIZE against materialized views. Please refer to the below limitations.
https://docs.databricks.com/en/sql/user/materialized-views.html#limitations
Thanks,
Shan
โ01-19-2024 06:44 AM
Hi shan_chandra, thanks for the response.
We found that piece of documentation, but there's a couple points that we find vexing about this.
I feel like the MVs we have in hive_metastore and the MVs we have in Unity Catalog have fundamentally different behavior and limitations, and it's been a challenge to navigate.
โ01-19-2024 07:14 AM
For anyone who's reading this later: You can still Z-order your materialized views, but you can't run it as a SQL command. Instead, you can set it as one of the TBLPROPERTIES when you define the table. Here's an example:
create or refresh live table {new_table}
tblproperties (
"quality" = "gold",
"pipelines.autoOptimize.zOrderCols" = "KEY_1,KEY_2"
)
as
select * from {old_table}
This won't Z-order things immediately, but it will do the Z-ordering during the pipeline's maintenance job. It also means that your tables won't be Z-ordered until the maintenance job has run, and to my knowledge you have little control over when it will run. This can be a problem if you're running the pipelines on a daily basis.
โ01-19-2024 10:21 AM
Hi @Jake2 - Thanks for your response. The above method to z-order as a part of the DLT pipeline maintenance job will address this. On Comparison between UC materialized views vs Hive metastore materialized views, Materialized views outside of UC are just the tables, fully recomputed. In UC, the underlying architecture is modified to allow an incremental compute.
Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you wonโt want to miss the chance to attend and share knowledge.
If there isnโt a group near you, start one and help create a community that brings people together.
Request a New Group