02-03-2025 11:55 AM
I tried creating an external table from a partitioned parquet folder in Unity Catalog.
Initially, I created the table from the Data Ingestion UI. It worked but only a tiny portion of the table was actually loaded.
Next, I tried running a SQL DDL CREATE TABLE IF NOT EXISTS prod.reporting.send_date_summary ( ... ) USING parquet LOCATION "s3:/...." I tried with LOCATION as the root folder containing all subfolders and files, as well as the root folder with wild cards for subfolders. Querying this table threw the following error: ExecutionException: com.databricks.sql.managedcatalog.UnityCatalogServiceException: [RequestId=b2e92078-29ce-4383-9501-3aa5adaa5e93 ErrorClass=BAD_REQUEST] Request URL is not related to this EXTERNAL table. The request url = Some(s3://....). The table storage path is Some(s3://...).
Can you tell me how external tables should be created in Unity Catalog from partitioned parquet files?
02-03-2025 12:18 PM
Hi @nwong,
Ensure that the Cloud Storage URI provided is complete and correct, including the scheme, since error looks to be complaining about it.
Can you try using this: https://docs.databricks.com/en/tables/external-partition-discovery.html
CREATE OR REPLACE TABLE <catalog>.<schema>.<table-name>
USING <format>
PARTITIONED BY (<partition-column-list>)
LOCATION 's3://<bucket-path>/<table-directory>';
02-03-2025 03:45 PM
Thanks for the suggestion.
I added
PARTITIONED BY (partition_date)
but not all of the data is loaded.
The parquet files are not stored in Hive-style partitioning. The folder format is
s3://bucket_name/send_date_summary/report/yyyy/mm/dd/*.parquet
where yyyy/mm/dd is the partition_date.
Do I need to manually add each partition? There are hundreds.
Is there an easier way to do this?
And will setting partition log automatically discover new partitions that are added to the folder?
SET spark.databricks.nonDelta.partitionLog.enabled = true;
02-03-2025 04:12 PM
I just tried manually adding a partition
02-07-2025 02:13 AM
Hi.
We are experiencing the exact same problem.
Does anyone have a solution to the issue?
Or do we really need to rename our partitioning to year=2024,month=01,day=01?
3 weeks ago
You can use recursiveFileLookup in OPTIONS, as shown in this topic: https://community.databricks.com/t5/data-engineering/external-table-from-external-location/td-p/6924...
Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.
If there isn’t a group near you, start one and help create a community that brings people together.
Request a New Group