cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Drop table not working consistently

NW1000
New Contributor III

During development, I drop table table_frq in SQL query mannually. Then run a python notebook using serverless compute, use spark.catalog.tableExists(table_frq) as the condition. Last week, after drop table, spark.catalog.tableExists(table_frq) showed "false". However, yesterday it showed "true". I tried serverless and classic compute using RunTime 17.3ML LTS and 18.0ML, all showed "true". Is there anything changed in #UC? The suspicion is the table's metadata corrupted. But how

Then today "drop table" works again using SQL query. I am concerned about not knowing the reasons for the change and the system being not stable. If anyone could share your experience it is greatly appreciated. 

3 REPLIES 3

nayan_wylde
Esteemed Contributor II

This is a real, known class of behavior with Unity Catalog (UC), and what you observed does not point to data corruption. It does point to metadata visibility, caching, and catalog context interactions—especially with serverless compute and spark.catalog.tableExists().

Nothing was “corrupted.”
Nothing was permanently unstable.
The behavior you saw is consistent with Unity Catalog metadata caching + catalog context + serverless/SQL path differences.
spark.catalog.tableExists() is not a strong guarantee under UC, especially right after DDL changes.
The fact that DROP TABLE started behaving “normally” again strongly indicates cache invalidation / propagation timing, not corruption.

NW1000
New Contributor III

Are there actions which can be taken to make the process consistent?

SteveOstrowski
Databricks Employee
Databricks Employee

@NW1000

Thanks for the thorough description. I can understand the concern about inconsistent behavior here. The previous reply from nayan_wylde is on the right track -- this is almost certainly a metadata caching issue, not data corruption. Let me give you concrete actions you can take to make your workflow reliable and consistent.


WHY THIS HAPPENS

When you run DROP TABLE via a SQL query (for example, in the SQL editor or a SQL warehouse), the Unity Catalog metastore processes the DDL and removes the table's metadata. However, Spark sessions on other compute resources (notebooks on serverless, classic clusters, etc.) maintain their own local metadata cache. The spark.catalog.tableExists() API reads from this local Spark catalog cache, which may not immediately reflect changes made by a different compute session or a different cluster.

This is especially pronounced when:
- You drop a table in one compute context (e.g., SQL editor) and check existence in another (e.g., a Python notebook on serverless)
- You have long-running Spark sessions or clusters that have already cached the table metadata
- You are using serverless compute, which may reuse warm executor pools that retain cached metadata from prior executions


RECOMMENDED ACTIONS FOR CONSISTENT BEHAVIOR

Here are several approaches you can use, from simplest to most robust:

1. Use REFRESH TABLE or SQL-based existence checks

Instead of relying on spark.catalog.tableExists(), run a SQL-based check that forces a fresh metadata lookup:

# Option A: Use SQL to check existence (more reliable under UC)
def table_exists(spark, table_name):
try:
spark.sql(f"DESCRIBE TABLE {table_name}")
return True
except Exception:
return False

Or use SHOW TABLES with a filter:

result = spark.sql("SHOW TABLES IN your_catalog.your_schema LIKE 'table_frq'")
exists = result.count() > 0

2. Invalidate the Spark cache before checking

If you want to keep using spark.catalog.tableExists(), force a cache refresh first:

# Clear all cached metadata
spark.sql("CLEAR CACHE")

# Now check -- this should reflect the current state
spark.catalog.tableExists("table_frq")

Or target a specific table (note: this only works if the table still exists):

spark.sql("REFRESH TABLE your_catalog.your_schema.table_frq")

3. Use fully qualified table names

Always use three-part names (catalog.schema.table) to avoid any ambiguity about which catalog or schema context is being used:

spark.catalog.tableExists("your_catalog.your_schema.table_frq")

This ensures you are not accidentally checking against a different catalog or schema default that may vary between compute sessions.

4. Drop and verify in the same session

If your workflow involves dropping a table and then checking if it exists, do both operations in the same Spark session to avoid cross-session caching issues:

spark.sql("DROP TABLE IF EXISTS your_catalog.your_schema.table_frq")
# The same session's cache should be invalidated by the DROP
exists = spark.catalog.tableExists("your_catalog.your_schema.table_frq")

5. For critical pipelines, add a retry with a short delay

In rare cases where you need cross-session consistency, adding a brief pause and re-check can help:

import time

spark.sql("CLEAR CACHE")
time.sleep(2)

if not spark.catalog.tableExists("your_catalog.your_schema.table_frq"):
# Table is confirmed dropped, proceed
pass


WHY IT "FIXED ITSELF"

The fact that DROP TABLE behavior appeared to return to normal is consistent with cache expiration. The Spark metadata cache has a time-to-live, and once that TTL expires, subsequent calls to spark.catalog.tableExists() fetch fresh metadata from Unity Catalog and correctly reflect the current state. This is not instability -- it is expected caching behavior with eventual consistency across compute sessions.


RECOMMENDED APPROACH

For the most robust development workflow, I would recommend option 1 above (SQL-based existence checks) or option 4 (drop and verify in the same session). These approaches bypass the Spark catalog cache entirely or ensure the cache is consistent within a single session.

If you continue to observe this behavior even within the same session after running DROP TABLE, that would be worth opening a support ticket for, as it could indicate a platform-level issue.


DOCUMENTATION REFERENCES

- DROP TABLE syntax and behavior: https://docs.databricks.com/en/sql/language-manual/sql-ref-syntax-ddl-drop-table.html
- REFRESH TABLE (cache invalidation): https://docs.databricks.com/en/sql/language-manual/sql-ref-syntax-aux-cache-refresh-table.html
- CLEAR CACHE command: https://docs.databricks.com/en/sql/language-manual/sql-ref-syntax-aux-cache-clear-cache.html
- Managed tables in Unity Catalog: https://docs.databricks.com/en/tables/managed.html
- UNDROP TABLE (recovering dropped tables): https://docs.databricks.com/en/sql/language-manual/sql-ref-syntax-ddl-undrop-table.html
- Disk cache vs Spark cache: https://docs.databricks.com/en/optimizations/disk-cache.html

Hope this helps make your workflow more predictable. Let us know if any of these approaches resolves the inconsistency for you.

* This reply used an agent system I built to research and draft this response based on the wide set of documentation I have available and previous memory. I personally review the draft for any obvious issues and for monitoring system reliability and update it when I detect any drift, but there is still a small chance that something is inaccurate, especially if you are experimenting with brand new features.

* This reply used an agent system I built to research and draft this response based on the wide set of documentation I have available and previous memory. I personally review the draft for any obvious issues and for monitoring system reliability and update it when I detect any drift, but there is still a small chance that something is inaccurate, especially if you are experimenting with brand new features.