cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

Z-Ordering a Unity Catalog Materialized View

Jake2
New Contributor III

Hey everyone, 

We're making the move to Unity Catalog from Hive_Metastore and we're running into some issues performing Z-order optimizations on some of our tables. 

These tables are, in either place, materialized views created with a "create or refresh live table" statement. They have the same schemas and data. 

However, when we try to Z-order these Unity Catalog materialized views, we get the following message:

 

AnalysisException: <table_name> is a materialized view. 'OPTIMIZE' expects a table.

 

Given that are Hive_metastore equivalent tables are also materalized views, we're unsure why we only error out on the Unity Catalog ones. Has anyone run into this issue, and if so, how did you get around it? 

Thanks in advance!

 

1 ACCEPTED SOLUTION

Accepted Solutions

Jake2
New Contributor III

For anyone who's reading this later: You can still Z-order your materialized views, but you can't run it as a SQL command. Instead, you can set it as one of the TBLPROPERTIES when you define the table. Here's an example:

create or refresh live table {new_table}
tblproperties (
	"quality" = "gold",
	"pipelines.autoOptimize.zOrderCols" = "KEY_1,KEY_2"
)
as
select * from {old_table}

This won't Z-order things immediately, but it will do the Z-ordering during the pipeline's maintenance job. It also means that your tables won't be Z-ordered until the maintenance job has run, and to my knowledge you have little control over when it will run. This can be a problem if you're running the pipelines on a daily basis. 

View solution in original post

4 REPLIES 4

shan_chandra
Databricks Employee
Databricks Employee

Hi @Jake2  -  we cannot run OPTIMIZE against materialized views. Please refer to the below limitations. 

https://docs.databricks.com/en/sql/user/materialized-views.html#limitations

Thanks,

Shan

Jake2
New Contributor III

Hi shan_chandra, thanks for the response.

We found that piece of documentation, but there's a couple points that we find vexing about this.

  • It mentions that MVs are "Unity Catalog managed tables" - we have been using them on hive_metastore for quite some time now.
  • The limitations do say you can't run ad-hoc OPTIMIZE commands against MVs, but again, we've been doing that for quite some time on our hive_metastore MVs. 
  • To that point, there isn't a clear indicator as to what's ad-hoc and what isn't. If we wanted to OPTIMIZE the MVs on a non-ad-hoc basis, there isn't anything on the page that shows you how.

I feel like the MVs we have in hive_metastore and the MVs we have in Unity Catalog have fundamentally different behavior and limitations, and it's been a challenge to navigate.

Jake2
New Contributor III

For anyone who's reading this later: You can still Z-order your materialized views, but you can't run it as a SQL command. Instead, you can set it as one of the TBLPROPERTIES when you define the table. Here's an example:

create or refresh live table {new_table}
tblproperties (
	"quality" = "gold",
	"pipelines.autoOptimize.zOrderCols" = "KEY_1,KEY_2"
)
as
select * from {old_table}

This won't Z-order things immediately, but it will do the Z-ordering during the pipeline's maintenance job. It also means that your tables won't be Z-ordered until the maintenance job has run, and to my knowledge you have little control over when it will run. This can be a problem if you're running the pipelines on a daily basis. 

shan_chandra
Databricks Employee
Databricks Employee

Hi @Jake2 - Thanks for your response. The above method to z-order as a part of the DLT pipeline maintenance job will address this. On Comparison between UC materialized views vs Hive metastore materialized views, Materialized views outside of UC are just the tables, fully recomputed. In UC, the underlying architecture is modified to allow an incremental compute. 

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you wonโ€™t want to miss the chance to attend and share knowledge.

If there isnโ€™t a group near you, start one and help create a community that brings people together.

Request a New Group