cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

CREATE TABLE does not overwrite location whereas CREATE OR REPLACE TABLE does

Dhruv-22
New Contributor III

I am working on Azure Databricks, with Databricks Runtime version being - 14.3 LTS (includes Apache Spark 3.5.0, Scala 2.12). I am facing the following issue.

Suppose I have a view named v1 and a database f1_processed created from the following command.

CREATE DATABASE IF NOT EXISTS f1_processed
LOCATION "abfss://processed@formula1dl679student.dfs.core.windows.net/"

Then if I try to create a table at a location that already exists using the command below.

CREATE TABLE f1_processed.circuits 
AS 
SELECT * FROM v1;

I get the following error

[DELTA_CREATE_TABLE_WITH_NON_EMPTY_LOCATION] Cannot create table ('`spark_catalog`.`f1_processed`.`circuits`'). 
The associated location ('abfss://processed@formula1dl679student.dfs.core.windows.net/circuits') 
is not empty and also not a Delta table. SQLSTATE: 42601

However, if I replace CREATE with CREATE OR REPLACE, then the command runs fine. So the following code runs fine.

CREATE OR REPLACE TABLE f1_processed.circuits 
AS 
SELECT * FROM v1;

The table didn't exist before. So CREATE OR REPLACE is also basically creating a table. Shouldn't the behaviour be consistent with the CREATE command?

Also, the table creation is only happening for delta format. If I specify the format to be any other like parquet, then it fails. Checkout this question

Is it a bug? Any help is appreciated

4 REPLIES 4

Ayushi_Suthar
Databricks Employee
Databricks Employee

Hi @Dhruv-22 , 

Based on the information you shared above, the "CREATE OR REPLACE" and "CREATE" commands in Databricks do have different behaviours, particularly when it comes to handling tables with specific target locations.

The "CREATE OR REPLACE" command is designed to either create a new table or replace an existing one. On the other hand, the "CREATE" command is used to create a new table. 

Please refer this documentation for more details: https://docs.databricks.com/en/sql/language-manual/sql-ref-syntax-ddl-create-table-using.html

Hope this helps you and please leave a like if this helps, followups are appreciated.
Kudos
Ayushi

Okay, I read the documentation. I think since REPLACE maintains the version history, even if the location has some files already it keeps them for that purpose. Am I right?

Hi @Dhruv-22 

It seems there might be some confusion. The REPLACE command you're referring to is not explicitly mentioned in the context. However, there are references to the version history in Delta Lake tables.

In Delta Lake tables, each operation that modifies a table creates a new table version. You can use this history information to audit operations, rollback a table, or query a table at a specific point in time using time travel. There's also a RESTORE command that can restore a Delta table to its earlier state, but again, this doesn't directly relate to a REPLACE command.

I hope this helps you and leave a like if this helps, followups are appreciated.
Kudos
Ayushi

Hey @Ayushi_Suthar, I am talking about the link you shared - click here. It says the following

REPLACE

If specified replaces the table and its content if it already exists. This clause is only supported for Delta Lake tables.

REPLACE preserves the table history.


That is why I think that Delta does not delete the files that already exist at the location, in case they might contain the history of the previous version of the table. And since it does not need to delete the data that is already there in the location it can create a managed table without any error. Whereas other formats need to delete the data and produce errors. Am I right?

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group