cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Materialized view resources created after creation query failed

VitorAvancini
New Contributor III

Hello, Im testing with Materialized Views and I ran the following query:

create materialized view `xx`.`xx`.`stg_sales`
PARTITIONED BY (id, external_id, tenant_id)
TBLPROPERTIES ('cluster_by' = 'id') as
with stg_accounts_events as (
select
    *
from
    `xx`.`xx`.`snapshot_account_events`
),
cast_columns as (
select
    id,
    external_id,
    tenant_id
from
    stg_accounts_events
)
select
*
from
cast_columns

This query was 'redacted'  and it was a very heavy query on its real context. We have query max timeout set to 10 minutes and this query took longer than 10 minutes it did timeout.

The only log I had was that the query timed out.

With all that said, a materialized view was created with a name __staging__acounts_{some hash id}__ which I did not create, im assuming its some 'behind the scenes' resource created by databricks. Also a DLT job was created and occurred in costs for some days.

Im aware the DLT is how MVs gets refreshed behind the scenes, the issue here was that my query did not work and my MV was not created, however the managed databricks resources were created and I had no way of knowing this without querying system tables and finding MVs that I did not purposely created.

Is this a bug? if not and this is the expected behaviour, is this documented somewhere making clear that I have to cleanup resources created by databricks myself when my query do not work?

Thank you!

3 REPLIES 3

Brahmareddy
Honored Contributor III

Hi Victor,

How are doing today?, As per my understanding, Yeah, this seems like expected behavior, but it's definitely not well-documented. When you create a Materialized View (MV) in Databricks, it automatically sets up a DLT (Delta Live Tables) pipeline behind the scenes to manage and refresh the view. Even if the query times out or fails, Databricks still creates internal resources, like the MV metadata (__staging__accounts_{hash}__) and the DLT job, which can continue running and incurring costs unless manually cleaned up. Unfortunately, there isn’t a built-in way to detect failed MV creations easily.

To avoid this in the future, you might want to test your query separately before using it in an MV and monitor system tables (information_schema.materialized_views) for any unexpected MVs. If you notice unwanted resources, you’ll need to manually drop the MV and stop the DLT job. It would be great if Databricks had a clearer way to notify users when an MV creation fails but still leaves behind resources—maybe worth raising a support ticket for better documentation. Let me know if you need help cleaning up those resources!

Regards,

Brahma

VitorAvancini
New Contributor III

Thank you for the very complete answer! 

I keep changing my mind between bug and expected behavior, maybe if it printed out on the error logs that resources were created I would be convinced that this is how it works.

I will raise a support ticket like you have mentioned, thank you Brahma

 

Brahmareddy
Honored Contributor III

You are always welcome, Vitor. Do let me know for any queries. Good day.