CREATE TABLE does not overwrite location whereas CREATE OR REPLACE TABLE does
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
02-08-2024 01:33 AM
I am working on Azure Databricks, with Databricks Runtime version being - 14.3 LTS (includes Apache Spark 3.5.0, Scala 2.12). I am facing the following issue.
Suppose I have a view named v1 and a database f1_processed created from the following command.
CREATE DATABASE IF NOT EXISTS f1_processed LOCATION "abfss://processed@formula1dl679student.dfs.core.windows.net/"
Then if I try to create a table at a location that already exists using the command below.
CREATE TABLE f1_processed.circuits AS SELECT * FROM v1;
I get the following error
[DELTA_CREATE_TABLE_WITH_NON_EMPTY_LOCATION] Cannot create table ('`spark_catalog`.`f1_processed`.`circuits`').
The associated location ('abfss://processed@formula1dl679student.dfs.core.windows.net/circuits')
is not empty and also not a Delta table. SQLSTATE: 42601However, if I replace CREATE with CREATE OR REPLACE, then the command runs fine. So the following code runs fine.
CREATE OR REPLACE TABLE f1_processed.circuits AS SELECT * FROM v1;
The table didn't exist before. So CREATE OR REPLACE is also basically creating a table. Shouldn't the behaviour be consistent with the CREATE command?
Also, the table creation is only happening for delta format. If I specify the format to be any other like parquet, then it fails. Checkout this question
Is it a bug? Any help is appreciated
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
02-08-2024 06:41 AM
Hi @Dhruv-22 ,
Based on the information you shared above, the "CREATE OR REPLACE" and "CREATE" commands in Databricks do have different behaviours, particularly when it comes to handling tables with specific target locations.
The "CREATE OR REPLACE" command is designed to either create a new table or replace an existing one. On the other hand, the "CREATE" command is used to create a new table.
Please refer this documentation for more details: https://docs.databricks.com/en/sql/language-manual/sql-ref-syntax-ddl-create-table-using.html
Hope this helps you and please leave a like if this helps, followups are appreciated.
Kudos
Ayushi
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
02-08-2024 09:09 AM
Okay, I read the documentation. I think since REPLACE maintains the version history, even if the location has some files already it keeps them for that purpose. Am I right?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
02-08-2024 08:55 PM
Hi @Dhruv-22
It seems there might be some confusion. The REPLACE command you're referring to is not explicitly mentioned in the context. However, there are references to the version history in Delta Lake tables.
In Delta Lake tables, each operation that modifies a table creates a new table version. You can use this history information to audit operations, rollback a table, or query a table at a specific point in time using time travel. There's also a RESTORE command that can restore a Delta table to its earlier state, but again, this doesn't directly relate to a REPLACE command.
I hope this helps you and leave a like if this helps, followups are appreciated.
Kudos
Ayushi
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
02-08-2024 10:35 PM
Hey @Ayushi_Suthar, I am talking about the link you shared - click here. It says the following
REPLACE
If specified replaces the table and its content if it already exists. This clause is only supported for Delta Lake tables.
REPLACE preserves the table history.
That is why I think that Delta does not delete the files that already exist at the location, in case they might contain the history of the previous version of the table. And since it does not need to delete the data that is already there in the location it can create a managed table without any error. Whereas other formats need to delete the data and produce errors. Am I right?