โ09-18-2024 07:25 PM
Hi; I'm new to Databricks, so apologies if this is a dumb question.
I have a notebook with SQL cells that are selecting data from various Delta tables into temporary views. Then I have a query that joins up the data from these temporary views.
I'd like to use that query to populate a materialized view, however, the CREATE statement throws an error complaining that the temp views cannot be found. This is course rubbish, because it finds the views just fine when it's just a SELECT. The error only happens when the CREATE AS is appended to the front.
So,
"SELECT foo, bar FROM viewa" -- works fine.
"CREATE OR REPLACE MATERIALIZED VIEW matview AS SELECT foo, bar FROM viewa" -- says viewa cannot be found.
I just ran this with CREATE OR REPLACE TABLE instead, and that worked. So I guess this is a limitation of materialized views? If so, 1) can this limitation be noted in the mat view documentation, and b) can the error message be corrected to explain that?
โ09-26-2024 01:55 PM
Thanks, FelixIvy. Just to clarify, the reason you can't use temporary views to load a materialized view is because materialized views (like regular views) must be created using a single query that is saved as part of the view definition. So the solution in this case is to rewrite the temporary views as ctes or subqueries, so all of the code is in one single query -- as if it were a regular view.
Or, if the code is complex enough, it may be cleaner, more visible, and easier to maintain and optimise to leave the code as-is, in a notebook: load the data into a table, and then schedule the notebook to refresh the data in the table periodically.
โ09-18-2024 08:06 PM - edited โ09-18-2024 08:16 PM
Hi @TamD
You cannot create a materialized view on a temporary view because temporary views are session-based. Once the session expires, the temporary view will be dropped. To resolve this, you need to create either a regular view or a table, and then you can successfully create a materialized view.
The best practice is to create a materialized view directly on a table.
โ09-18-2024 08:30 PM
Hi Akshay
Yes, the temporary view exists for the duration of the session. The table in my example above is loaded in the same session as the temp views and that works fine. The materialized view is also loaded in the same session as the temp views, only the mat view creation "cannot find" the temp views.
It makes sense that a VIEW would not work created against temporary views, however the data in a materialized view is persisted and does not change until the notebook is run again to refresh the data: once the data is loaded, the materialized view has no dependency on the temp views used to load it until the notebook is run again (and the temp views are recreated). Therefore, I don't see a reason why a table load would work and a mat view load would not.
โ09-18-2024 09:11 PM
Hi @TamD
โ09-22-2024 01:05 PM
I see. The issue is that although a mat view has to be scheduled and loaded (like a table), it still must be defined as if it were a regular view. A table can be loaded and refreshed via a notebook, but a view (even a materialised one) has to have a single query as its underlying definition (you can't use a notebook). So I can either keep my notebook as-is and schedule it to load/refresh a table, or I can re-write it into a bunch of ctes and use that to define a mat view.
โ09-22-2024 11:14 PM
Temporary views are session-based and might not be used for materialized views. Use a permanent table or view.
โ09-26-2024 01:55 PM
Thanks, FelixIvy. Just to clarify, the reason you can't use temporary views to load a materialized view is because materialized views (like regular views) must be created using a single query that is saved as part of the view definition. So the solution in this case is to rewrite the temporary views as ctes or subqueries, so all of the code is in one single query -- as if it were a regular view.
Or, if the code is complex enough, it may be cleaner, more visible, and easier to maintain and optimise to leave the code as-is, in a notebook: load the data into a table, and then schedule the notebook to refresh the data in the table periodically.
Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you wonโt want to miss the chance to attend and share knowledge.
If there isnโt a group near you, start one and help create a community that brings people together.
Request a New Group