cancel
Showing results for 
Search instead for 
Did you mean: 
Get Started Discussions
Start your journey with Databricks by joining discussions on getting started guides, tutorials, and introductory topics. Connect with beginners and experts alike to kickstart your Databricks experience.
cancel
Showing results for 
Search instead for 
Did you mean: 

Same Delta Table, Different Behavior: Dev vs Prod Workspace in Databricks

Smriti2
New Contributor II

I recently ran into an interesting Databricks behavior while implementing a row-count comparison using Delta Time Travel (VERSION AS OF).

Platform: Azure
Scenario:

  • Same Unity Catalog
  • Same fully qualified table
  • Same table ID, location, and Delta format

Yet the behavior differed across environments.

What worked in Dev

  • I ran the notebook interactively
  • Using an all-purpose cluster
  • Delta Time Travel (VERSION AS OF) worked as expected

What failed in Prod

  • The same notebook ran as a scheduled Job
  • Executed on a job cluster on prod workspace with scheduled job that has one task with a notebook
  • The exact same Delta table failed with:

 

TIME TRAVEL is not allowed. Operation not supported on Streaming Tables

The surprising part 

The table itself was unchanged:

  • Same catalog
  • Same location
  • Same Delta properties
  • Same table ID

My code compares active row counts between the last two Delta versions of a table, and fails if the row count drops more than 15%, using Delta time travel (VERSION AS OF) to read past snapshots.

6 REPLIES 6

pradeep_singh
Contributor III

This could happen if the Delta table in production is a streaming table, while the Delta table in development is a regular Delta table. Time travel doesn’t work well with streaming tables and materialized views.

Thank You
Pradeep Singh - https://www.linkedin.com/in/dbxdev

I check the detail of both the table, it is pointing to the same location and same table id for both the workspace and the format is delta for that table no matter which workspace I query from

SO you have the same cloud storage location as the location for this table . So its essentially the same metastore , same catalog , same schema , same location , same credentials . The only thing thats diffrent is workspace ? 

Thank You
Pradeep Singh - https://www.linkedin.com/in/dbxdev

Yes that is true

Viraj92
New Contributor II

 try to refresh the table in same job 

https://learn.microsoft.com/en-gb/azure/databricks/ldp/dbsql/streaming

As per Doc : You might need to refresh your streaming table before using time travel queries.

SteveOstrowski
Databricks Employee
Databricks Employee

Hi @Smriti2,

The error "TIME TRAVEL is not allowed. Operation not supported on Streaming Tables" when the table is not intentionally a streaming table typically points to the table having pipeline-related metadata attached to it in the prod workspace. Here are the most likely causes and how to investigate.

STEP 1: CHECK THE TABLE TYPE IN PROD

Run these commands in your prod workspace against the table in question:

DESCRIBE EXTENDED catalog.schema.your_table;

Look at the "Type" field in the output. If it shows STREAMING_TABLE instead of MANAGED or EXTERNAL, the table is registered as a streaming table in Unity Catalog, which blocks time travel. Also check:

SHOW TBLPROPERTIES catalog.schema.your_table;

Look for properties like:
- pipelines.pipelineId
- pipelines.metastore

If these properties exist, the table is being managed by a Lakeflow Spark Declarative Pipeline (SDP, formerly known as DLT). Tables managed by a pipeline are treated as streaming tables and have time travel disabled by design, because the pipeline controls the table's lifecycle and versioning.

WHY IT WORKS IN DEV BUT NOT PROD

There are a few scenarios that explain this difference:

1. The table in prod was created or is managed by a pipeline (even if it was not your intention). If someone created the table using CREATE STREAMING TABLE or through an SDP pipeline definition, it gets the streaming table designation. The same fully qualified table name in dev may have been created as a regular Delta table (CREATE TABLE), which does support time travel.

2. The table was recreated or replaced in prod by a pipeline run. Even if the original table was a regular Delta table, if a pipeline was later configured to write to that same table name using a streaming table definition, the table metadata gets overwritten with the streaming table type.

3. Different Databricks Runtime versions. While less likely for this specific error, confirm that both clusters are running the same DBR version. Certain runtime versions may enforce streaming table restrictions more strictly.

4. Cluster configuration differences. All-purpose clusters in dev sometimes have different spark config defaults than job clusters. If the job cluster is configured as part of a pipeline, it would enforce pipeline semantics on the table.

STEP 2: CONFIRM THE TABLE IS IDENTICAL

Even though you mentioned the table shares the same catalog, location, and table ID, verify by running the following in both workspaces:

SELECT * FROM system.information_schema.tables
WHERE table_catalog = 'your_catalog'
  AND table_schema = 'your_schema'
  AND table_name = 'your_table';

Compare the table_type column between dev and prod. Also compare:

DESCRIBE HISTORY catalog.schema.your_table;

If the table truly is the same (same table ID, same location), the behavior difference could be caused by the compute environment. Specifically, if the prod job is running inside a pipeline task (a Lakeflow SDP task in a workflow), the runtime enforces streaming table semantics on tables it manages.

STEP 3: RESOLUTION OPTIONS

If the table is incorrectly marked as a streaming table:

Option A: If the table should be a regular Delta table, you can drop and recreate it as a standard managed table (CREATE TABLE ... AS SELECT), then re-run your time travel queries. Be careful to back up data first.

Option B: If the table genuinely needs to be a streaming table for ingestion purposes but you also need version comparison, use DESCRIBE HISTORY to get version metadata and row counts instead of reading historical versions directly:

DESCRIBE HISTORY catalog.schema.your_table LIMIT 2;

This gives you version numbers, timestamps, and operation metrics (including numOutputRows) without requiring time travel reads.

Option C: If your prod job is running as a task inside a pipeline, move the validation notebook to a separate workflow task (a standard notebook task on a job cluster, not a pipeline task). This ensures the notebook runs with standard Delta semantics rather than pipeline semantics.

STEP 4: ALTERNATIVE APPROACH FOR ROW COUNT VALIDATION

Since your use case is comparing active row counts between versions, consider using the operation metrics from DESCRIBE HISTORY directly:

history_df = spark.sql("DESCRIBE HISTORY catalog.schema.your_table LIMIT 2")
history_df.select("version", "timestamp", "operationMetrics").show(truncate=False)

The operationMetrics map contains keys like numOutputRows, numAddedFiles, and numRemovedFiles that can help you detect unexpected drops without needing VERSION AS OF.

Alternatively, if you want to keep the time travel approach, you can read the Delta log directly:

from delta.tables import DeltaTable
dt = DeltaTable.forName(spark, "catalog.schema.your_table")
current_version = dt.history(1).select("version").first()[0]

Then use the version number with spark.read.option("versionAsOf", current_version - 1).table(...), but only if the table type allows it.

DOCUMENTATION REFERENCES

- Delta time travel: https://docs.databricks.com/aws/en/delta/history.html
- Streaming tables overview: https://docs.databricks.com/aws/en/tables/streaming.html
- DESCRIBE HISTORY: https://docs.databricks.com/aws/en/sql/language-manual/delta-describe-history.html
- Table properties reference: https://docs.databricks.com/aws/en/delta/table-properties.html

If you can share the output of DESCRIBE EXTENDED and SHOW TBLPROPERTIES from both workspaces, that will confirm exactly what is causing the difference.

* This reply used an agent system I built to research and draft this response based on the wide set of documentation I have available and previous memory. I personally review the draft for any obvious issues and for monitoring system reliability and update it when I detect any drift, but there is still a small chance that something is inaccurate, especially if you are experimenting with brand new features.

If this answer resolves your question, could you mark it as "Accept as Solution"? That helps other users quickly find the correct fix.