โ02-03-2025 11:55 AM
I tried creating an external table from a partitioned parquet folder in Unity Catalog.
Initially, I created the table from the Data Ingestion UI. It worked but only a tiny portion of the table was actually loaded.
Next, I tried running a SQL DDL CREATE TABLE IF NOT EXISTS prod.reporting.send_date_summary ( ... ) USING parquet LOCATION "s3:/...." I tried with LOCATION as the root folder containing all subfolders and files, as well as the root folder with wild cards for subfolders. Querying this table threw the following error: ExecutionException: com.databricks.sql.managedcatalog.UnityCatalogServiceException: [RequestId=b2e92078-29ce-4383-9501-3aa5adaa5e93 ErrorClass=BAD_REQUEST] Request URL is not related to this EXTERNAL table. The request url = Some(s3://....). The table storage path is Some(s3://...).
Can you tell me how external tables should be created in Unity Catalog from partitioned parquet files?
โ02-03-2025 12:18 PM
Hi @nwong,
Ensure that the Cloud Storage URI provided is complete and correct, including the scheme, since error looks to be complaining about it.
Can you try using this: https://docs.databricks.com/en/tables/external-partition-discovery.html
CREATE OR REPLACE TABLE <catalog>.<schema>.<table-name>
USING <format>
PARTITIONED BY (<partition-column-list>)
LOCATION 's3://<bucket-path>/<table-directory>';
โ02-03-2025 03:45 PM
Thanks for the suggestion.
I added
PARTITIONED BY (partition_date)
but not all of the data is loaded.
The parquet files are not stored in Hive-style partitioning. The folder format is
s3://bucket_name/send_date_summary/report/yyyy/mm/dd/*.parquet
where yyyy/mm/dd is the partition_date.
Do I need to manually add each partition? There are hundreds.
Is there an easier way to do this?
And will setting partition log automatically discover new partitions that are added to the folder?
SET spark.databricks.nonDelta.partitionLog.enabled = true;
โ02-03-2025 04:12 PM
I just tried manually adding a partition
โ02-07-2025 02:13 AM
Hi.
We are experiencing the exact same problem.
Does anyone have a solution to the issue?
Or do we really need to rename our partitioning to year=2024,month=01,day=01?
โ03-06-2025 02:20 AM
You can use recursiveFileLookup in OPTIONS, as shown in this topic: https://community.databricks.com/t5/data-engineering/external-table-from-external-location/td-p/6924...
Passionate about hosting events and connecting people? Help us grow a vibrant local communityโsign up today to get started!
Sign Up Now