I'm observing different behavior between Databricks Runtime versions when working with DataFrames and temporary views, and would appreciate any clarification.
In both environments, I performed the following steps in a notebook (each connected to its own cluster):
- transformed_df.createOrReplaceTempView("source_vw")
- {few transformations}
- transformed_df.count()
- transformed_df.createOrReplaceTempView("source_vw")
- transformed_df.count()
The transformations (step 2) involve modifying the same DataFrame (transformed_df), and at certain points, a temporary view is created using the same name (transformed_df).
In DBR 15.4, reusing the same name for the DataFrame and temporary view doesn't appear to overwrite the dataframes or temporary view as expected.
Behavior:
- In DBR 13.3 LTS, the results from Step 3 and Step 5 are identical (which is expected).
- In DBR 15.4 LTS, the result from Step 5 is different than Step 3.
To clarify, there were no writes or modifications to the underlying Delta table between the two counts. I’m just calling createOrReplaceTempView() a second time.
Attached images for reference.
Thanks in advance for any insight or references to release notes that could help explain this behavior.