@NW1000
Thanks for the thorough description. I can understand the concern about inconsistent behavior here. The previous reply from nayan_wylde is on the right track -- this is almost certainly a metadata caching issue, not data corruption. Let me give you concrete actions you can take to make your workflow reliable and consistent.
WHY THIS HAPPENS
When you run DROP TABLE via a SQL query (for example, in the SQL editor or a SQL warehouse), the Unity Catalog metastore processes the DDL and removes the table's metadata. However, Spark sessions on other compute resources (notebooks on serverless, classic clusters, etc.) maintain their own local metadata cache. The spark.catalog.tableExists() API reads from this local Spark catalog cache, which may not immediately reflect changes made by a different compute session or a different cluster.
This is especially pronounced when:
- You drop a table in one compute context (e.g., SQL editor) and check existence in another (e.g., a Python notebook on serverless)
- You have long-running Spark sessions or clusters that have already cached the table metadata
- You are using serverless compute, which may reuse warm executor pools that retain cached metadata from prior executions
RECOMMENDED ACTIONS FOR CONSISTENT BEHAVIOR
Here are several approaches you can use, from simplest to most robust:
1. Use REFRESH TABLE or SQL-based existence checks
Instead of relying on spark.catalog.tableExists(), run a SQL-based check that forces a fresh metadata lookup:
# Option A: Use SQL to check existence (more reliable under UC)
def table_exists(spark, table_name):
try:
spark.sql(f"DESCRIBE TABLE {table_name}")
return True
except Exception:
return False
Or use SHOW TABLES with a filter:
result = spark.sql("SHOW TABLES IN your_catalog.your_schema LIKE 'table_frq'")
exists = result.count() > 0
2. Invalidate the Spark cache before checking
If you want to keep using spark.catalog.tableExists(), force a cache refresh first:
# Clear all cached metadata
spark.sql("CLEAR CACHE")
# Now check -- this should reflect the current state
spark.catalog.tableExists("table_frq")
Or target a specific table (note: this only works if the table still exists):
spark.sql("REFRESH TABLE your_catalog.your_schema.table_frq")
3. Use fully qualified table names
Always use three-part names (catalog.schema.table) to avoid any ambiguity about which catalog or schema context is being used:
spark.catalog.tableExists("your_catalog.your_schema.table_frq")
This ensures you are not accidentally checking against a different catalog or schema default that may vary between compute sessions.
4. Drop and verify in the same session
If your workflow involves dropping a table and then checking if it exists, do both operations in the same Spark session to avoid cross-session caching issues:
spark.sql("DROP TABLE IF EXISTS your_catalog.your_schema.table_frq")
# The same session's cache should be invalidated by the DROP
exists = spark.catalog.tableExists("your_catalog.your_schema.table_frq")
5. For critical pipelines, add a retry with a short delay
In rare cases where you need cross-session consistency, adding a brief pause and re-check can help:
import time
spark.sql("CLEAR CACHE")
time.sleep(2)
if not spark.catalog.tableExists("your_catalog.your_schema.table_frq"):
# Table is confirmed dropped, proceed
pass
WHY IT "FIXED ITSELF"
The fact that DROP TABLE behavior appeared to return to normal is consistent with cache expiration. The Spark metadata cache has a time-to-live, and once that TTL expires, subsequent calls to spark.catalog.tableExists() fetch fresh metadata from Unity Catalog and correctly reflect the current state. This is not instability -- it is expected caching behavior with eventual consistency across compute sessions.
RECOMMENDED APPROACH
For the most robust development workflow, I would recommend option 1 above (SQL-based existence checks) or option 4 (drop and verify in the same session). These approaches bypass the Spark catalog cache entirely or ensure the cache is consistent within a single session.
If you continue to observe this behavior even within the same session after running DROP TABLE, that would be worth opening a support ticket for, as it could indicate a platform-level issue.
DOCUMENTATION REFERENCES
- DROP TABLE syntax and behavior: https://docs.databricks.com/en/sql/language-manual/sql-ref-syntax-ddl-drop-table.html
- REFRESH TABLE (cache invalidation): https://docs.databricks.com/en/sql/language-manual/sql-ref-syntax-aux-cache-refresh-table.html
- CLEAR CACHE command: https://docs.databricks.com/en/sql/language-manual/sql-ref-syntax-aux-cache-clear-cache.html
- Managed tables in Unity Catalog: https://docs.databricks.com/en/tables/managed.html
- UNDROP TABLE (recovering dropped tables): https://docs.databricks.com/en/sql/language-manual/sql-ref-syntax-ddl-undrop-table.html
- Disk cache vs Spark cache: https://docs.databricks.com/en/optimizations/disk-cache.html
Hope this helps make your workflow more predictable. Let us know if any of these approaches resolves the inconsistency for you.
* This reply used an agent system I built to research and draft this response based on the wide set of documentation I have available and previous memory. I personally review the draft for any obvious issues and for monitoring system reliability and update it when I detect any drift, but there is still a small chance that something is inaccurate, especially if you are experimenting with brand new features.
* This reply used an agent system I built to research and draft this response based on the wide set of documentation I have available and previous memory. I personally review the draft for any obvious issues and for monitoring system reliability and update it when I detect any drift, but there is still a small chance that something is inaccurate, especially if you are experimenting with brand new features.