cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Error creating Unity Catalog external table

nwong
New Contributor II

I tried creating an external table from a partitioned parquet folder in Unity Catalog.

Initially, I created the table from the Data Ingestion UI. It worked but only a tiny portion of the table was actually loaded.

Next, I tried running a SQL DDL CREATE TABLE IF NOT EXISTS prod.reporting.send_date_summary ( ... ) USING parquet LOCATION "s3:/...." I tried with LOCATION as the root folder containing all subfolders and files, as well as the root folder with wild cards for subfolders. Querying this table threw the following error: ExecutionException: com.databricks.sql.managedcatalog.UnityCatalogServiceException: [RequestId=b2e92078-29ce-4383-9501-3aa5adaa5e93 ErrorClass=BAD_REQUEST] Request URL is not related to this EXTERNAL table. The request url = Some(s3://....). The table storage path is Some(s3://...).

Can you tell me how external tables should be created in Unity Catalog from partitioned parquet files?

5 REPLIES 5

Alberto_Umana
Databricks Employee
Databricks Employee

Hi @nwong,

Ensure that the Cloud Storage URI provided is complete and correct, including the scheme, since error looks to be complaining about it.

Can you try using this: https://docs.databricks.com/en/tables/external-partition-discovery.html

CREATE OR REPLACE TABLE <catalog>.<schema>.<table-name>
USING <format>
PARTITIONED BY (<partition-column-list>)
LOCATION 's3://<bucket-path>/<table-directory>';

nwong
New Contributor II

Thanks for the suggestion.

I added 

PARTITIONED BY (partition_date)

but not all of the data is loaded.

The parquet files are not stored in Hive-style partitioning. The folder format is

s3://bucket_name/send_date_summary/report/yyyy/mm/dd/*.parquet

where yyyy/mm/dd is the partition_date. 

Do I need to manually add each partition? There are hundreds. 

Is there an easier way to do this? 

And will setting partition log automatically discover new partitions that are added to the folder?

SET spark.databricks.nonDelta.partitionLog.enabled = true;

 

nwong
New Contributor II

I just tried manually adding a partition

 

ALTER TABLE prod.reporting.send_date_summary
ADD PARTITION (year = 2022, month=12, day=05)
LOCATION 's3://bucket_name/send_date_summary/report/2022/12/05/';
 
and got an error 
[UC_COMMAND_NOT_SUPPORTED.WITHOUT_RECOMMENDATION] The command(s): ALTER TABLE (add partition) are not supported in Unity Catalog. SQLSTATE: 0AKUC
 
and I can't use MSCK REPAIR TABLE because the folders are not in Hive-style partitioning. 

mBusk
New Contributor II

Hi.

We are experiencing the exact same problem.
Does anyone have a solution to the issue?
Or do we really need to rename our partitioning to year=2024,month=01,day=01?

royvansanten
New Contributor II

You can use recursiveFileLookup in OPTIONS, as shown in this topic: https://community.databricks.com/t5/data-engineering/external-table-from-external-location/td-p/6924...

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group