Databricks Community

nwong · ‎02-03-2025

I tried creating an external table from a partitioned parquet folder in Unity Catalog.

Initially, I created the table from the Data Ingestion UI. It worked but only a tiny portion of the table was actually loaded.

Next, I tried running a SQL DDL CREATE TABLE IF NOT EXISTS prod.reporting.send_date_summary ( ... ) USING parquet LOCATION "s3:/...." I tried with LOCATION as the root folder containing all subfolders and files, as well as the root folder with wild cards for subfolders. Querying this table threw the following error: ExecutionException: com.databricks.sql.managedcatalog.UnityCatalogServiceException: [RequestId=b2e92078-29ce-4383-9501-3aa5adaa5e93 ErrorClass=BAD_REQUEST] Request URL is not related to this EXTERNAL table. The request url = Some(s3://....). The table storage path is Some(s3://...).

Can you tell me how external tables should be created in Unity Catalog from partitioned parquet files?

Alberto_Umana · ‎02-03-2025

Hi @nwong,

Ensure that the Cloud Storage URI provided is complete and correct, including the scheme, since error looks to be complaining about it.

Can you try using this: https://docs.databricks.com/en/tables/external-partition-discovery.html

CREATE OR REPLACE TABLE <catalog>.<schema>.<table-name>
USING <format>
PARTITIONED BY (<partition-column-list>)
LOCATION 's3://<bucket-path>/<table-directory>';

nwong · ‎02-03-2025

Thanks for the suggestion.

I added

PARTITIONED BY (partition_date)

but not all of the data is loaded.

The parquet files are not stored in Hive-style partitioning. The folder format is

s3://bucket_name/send_date_summary/report/yyyy/mm/dd/*.parquet

where yyyy/mm/dd is the partition_date.

Do I need to manually add each partition? There are hundreds.

Is there an easier way to do this?

And will setting partition log automatically discover new partitions that are added to the folder?

SET spark.databricks.nonDelta.partitionLog.enabled = true;

nwong · ‎02-03-2025

I just tried manually adding a partition

ALTER TABLE prod.reporting.send_date_summary

ADD PARTITION (year = 2022, month=12, day=05)

LOCATION 's3://bucket_name/send_date_summary/report/2022/12/05/';

and got an error

[UC_COMMAND_NOT_SUPPORTED.WITHOUT_RECOMMENDATION] The command(s): ALTER TABLE (add partition) are not supported in Unity Catalog. SQLSTATE: 0AKUC

and I can't use MSCK REPAIR TABLE because the folders are not in Hive-style partitioning.

mBusk · ‎02-07-2025

Hi.

We are experiencing the exact same problem.
Does anyone have a solution to the issue?
Or do we really need to rename our partitioning to year=2024,month=01,day=01?