cancel
Showing results for 
Search instead for 
Did you mean: 
Community Platform Discussions
Connect with fellow community members to discuss general topics related to the Databricks platform, industry trends, and best practices. Share experiences, ask questions, and foster collaboration within the community.
cancel
Showing results for 
Search instead for 
Did you mean: 

FileAlreadyExistsException error while analyzing table in Notebook

Miasu
New Contributor II

Databricks experts, 

I'm new to Databricks, and encounter an issue with the ANALYZE TABLE command in the Notebook. 

I created two tables nyc_taxi and nyc_taxi2, from one csv file.

When executing the following command in Notebook, 

analyze table nyc_taxi2 compute statistics for columns passenger_count;
A FileAlreadyExistsException error was raised, [FileAlreadyExistsException: Operation failed: "The specified path, or an element of the path, exists and its resource type is invalid for this operation.", 409, GET,......, PathConflict, "The specified path, or an element of the path, exists and its resource type is invalid for this operation.]
However, the same command for the nyc_taxi table worked well, with no error raised. I also tried the SQL Editor, it worked fine with the code. Only having issues with running the command in Notebook. I'm very confused as to why this happened. 
The only difference between nyc_taxi and nyc_taxi2 is that I created nyc_taxi using the UI, and created the nyc_taxi2 using the Databricks SQL commands, given below. 
CREATE TABLE nyc_taxi2
(vendor_id String,
pickup_datetime timestamp,
dropoff_datetime timestamp,
passenger_count int,
trip_distance double,
pickup_longitude double,
pickup_latitude double,
rate_code int,
store_and_fwd_flag string,
dropoff_longitude double,
dropoff_latitude double,
payment_type string,
fare_amount double,
surcharge double,
mta_tax double,
tip_amount double,
tolls_amount double,
total_amount double)
USING CSV
OPTIONS("path"="/users/myfolder/nyc_taxi.csv","header" = "true");

Can anyone direct what could be the reason for this problem? 
Thanks for the help!


1 REPLY 1

Miasu
New Contributor II

Hi @Retired_mod , thank you for your reply! 

I realized that another main difference between nyc_taxi and nyc_taxi2 is that nyc_taxi created using the UI, is a managed table, whereas nyc_taxi2 created using the SQL command is an external table. The locations are also different, nyc_taxi is stored under "dbfs:/user/hive/warehouse/myschema.db/nyc_taxi"; nyc_taxi2 is stored under "

dbfs:/users/feirxu/nyc_taxi.csv".  Could this be the reason for the error? If so, could you advise how to resolve this issue? 
I checked the documentation, and it says "With the UI, you can only create external tables. (https://learn.microsoft.com/en-us/azure/databricks/archive/legacy/data-tab#--create-a-table)", why the table I created using the UI is a managed table, while the table created using the SQL commands is an external table?
Did I miss any part?  
Thank you! 

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group