cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Unable to analyze external table | FileAlreadyExistsException

Miasu
New Contributor II

Hello experts, 

There's a csv file, "nyc_taxi.csv" saved under users/myfolder on DBFS, and I used this file created 2 tables:

1. nyc_taxi : created using the UI, and it appeared as a managed table saved under dbfs:/user/hive/warehouse/mydatabase.db/nyc_taxi

2. nyc_taxi2: created using the SQL commands below, and it shows as an external table, location: dbfs:/users/myfolder/nyc_taxi.csv

CREATE TABLE nyc_taxi2 
(vendor_id String,
pickup_datetime timestamp,
dropoff_datetime timestamp,
passenger_count int,
trip_distance double,
pickup_longitude double,
pickup_latitude double,
rate_code int,
store_and_fwd_flag string,
dropoff_longitude double,
dropoff_latitude double,
payment_type string,
fare_amount double,
surcharge double,
mta_tax double,
tip_amount double,
tolls_amount double,
total_amount double)
USING CSV OPTIONS("path"="/users/myfolder/nyc_taxi.csv","header" = "true");

The command below for nyc_taxi worked fine, 

 ANALYZE TABLE nyc_taxi compute statistics for all columns;

 whereas the same command for nyc_taxi2 raised a FileAlreadyExistsException error. (other commands (SELECT...FROM) works fine with the nyc_taxi2 table, but only the ANALYZE TABLE command so far)

ANALYZE TABLE nyc_taxi2 compute statistics for all columns;

[FileAlreadyExistsException: Operation failed: "The specified path, or an element of the path, exists and its resource type is invalid for this operation.", 409, GET,......, PathConflict, "The specified path, or an element of the path, exists and its resource type is invalid for this operation.]

How can I resolve the issue? 

Thanks for the help! 

2 REPLIES 2

NandiniN
Databricks Employee
Databricks Employee

Hi @Miasu 

You cannot point an external table to a managed table location. Managed tables and external tables have distinct storage configurations and purposes within Databricks and Unity Catalog.

The ANALYZE TABLE command also works differently for managed and external tables. In your case, nyc_taxi2 is an external table pointing to a specific file on DBFS (/users/myfolder/nyc_taxi.csv), hence the external table has a conflict with the operation you're trying to perform.

The reason SELECT works fine and ANALYZE TABLE doesn’t comes down to the fact that reading data from an external location doesn’t require modifying the file or its metadata, while ANALYZE TABLE needs to store additional metadata (statistics), which causes a conflict with the existing file in the external location.

Hope this helps.

Thanks!

NandiniN
Databricks Employee
Databricks Employee

Did you initially want to create an external or managed table? 

Just trying to understand what was your intent for the file.

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group