cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Data Engineering
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

Z-Ordering a Unity Catalog Materialized View

Jake2
New Contributor III

Hey everyone, 

We're making the move to Unity Catalog from Hive_Metastore and we're running into some issues performing Z-order optimizations on some of our tables. 

These tables are, in either place, materialized views created with a "create or refresh live table" statement. They have the same schemas and data. 

However, when we try to Z-order these Unity Catalog materialized views, we get the following message:

 

AnalysisException: <table_name> is a materialized view. 'OPTIMIZE' expects a table.

 

Given that are Hive_metastore equivalent tables are also materalized views, we're unsure why we only error out on the Unity Catalog ones. Has anyone run into this issue, and if so, how did you get around it? 

Thanks in advance!

 

1 ACCEPTED SOLUTION

Accepted Solutions

Jake2
New Contributor III

For anyone who's reading this later: You can still Z-order your materialized views, but you can't run it as a SQL command. Instead, you can set it as one of the TBLPROPERTIES when you define the table. Here's an example:

create or refresh live table {new_table}
tblproperties (
	"quality" = "gold",
	"pipelines.autoOptimize.zOrderCols" = "KEY_1,KEY_2"
)
as
select * from {old_table}

This won't Z-order things immediately, but it will do the Z-ordering during the pipeline's maintenance job. It also means that your tables won't be Z-ordered until the maintenance job has run, and to my knowledge you have little control over when it will run. This can be a problem if you're running the pipelines on a daily basis. 

View solution in original post

5 REPLIES 5

shan_chandra
Honored Contributor III
Honored Contributor III

Hi @Jake2  -  we cannot run OPTIMIZE against materialized views. Please refer to the below limitations. 

https://docs.databricks.com/en/sql/user/materialized-views.html#limitations

Thanks,

Shan

Jake2
New Contributor III

Hi shan_chandra, thanks for the response.

We found that piece of documentation, but there's a couple points that we find vexing about this.

  • It mentions that MVs are "Unity Catalog managed tables" - we have been using them on hive_metastore for quite some time now.
  • The limitations do say you can't run ad-hoc OPTIMIZE commands against MVs, but again, we've been doing that for quite some time on our hive_metastore MVs. 
  • To that point, there isn't a clear indicator as to what's ad-hoc and what isn't. If we wanted to OPTIMIZE the MVs on a non-ad-hoc basis, there isn't anything on the page that shows you how.

I feel like the MVs we have in hive_metastore and the MVs we have in Unity Catalog have fundamentally different behavior and limitations, and it's been a challenge to navigate.

Kaniz
Community Manager
Community Manager

Thank you for posting your question in our community! We are happy to assist you.

To help us provide you with the most accurate information, could you please take a moment to review the responses and select the one that best answers your question?

This will also help other community members who may have similar questions in the future. Thank you for your participation and let us know if you need any further assistance! 
 

Jake2
New Contributor III

For anyone who's reading this later: You can still Z-order your materialized views, but you can't run it as a SQL command. Instead, you can set it as one of the TBLPROPERTIES when you define the table. Here's an example:

create or refresh live table {new_table}
tblproperties (
	"quality" = "gold",
	"pipelines.autoOptimize.zOrderCols" = "KEY_1,KEY_2"
)
as
select * from {old_table}

This won't Z-order things immediately, but it will do the Z-ordering during the pipeline's maintenance job. It also means that your tables won't be Z-ordered until the maintenance job has run, and to my knowledge you have little control over when it will run. This can be a problem if you're running the pipelines on a daily basis. 

shan_chandra
Honored Contributor III
Honored Contributor III

Hi @Jake2 - Thanks for your response. The above method to z-order as a part of the DLT pipeline maintenance job will address this. On Comparison between UC materialized views vs Hive metastore materialized views, Materialized views outside of UC are just the tables, fully recomputed. In UC, the underlying architecture is modified to allow an incremental compute. 

Welcome to Databricks Community: Lets learn, network and celebrate together

Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections. 

Click here to register and join today! 

Engage in exciting technical discussions, join a group with your peers and meet our Featured Members.