Hi - I have existing parquet files in Cloudflare R2 storage (created outside of Databricks). I would like to share them via Delta Share, but I keep running into an error. Is it possible to share existing parquet files without duplicating them?
I did the following steps:
1. Created storage credential and external location pointing to an R2 bucket in Databricks Workspace (AWS)
2. Created catalog:
CREATE CATALOG IF NOT EXISTS <catalog_name>
MANAGED LOCATION 'r2://<bucket_name>@<account_id>.r2.cloudflarestorage.com'
COMMENT 'Location for managed tables and volumes to share using Delta Sharing';
3. Created delta table:
CONVERT TO DELTA parquet.`r2://<bucket_name>@<account_id>.r2.cloudflarestorage.com/<folder_name>/`;
CREATE TABLE IF NOT EXISTS <catalog_name>.<schema_name>.<table_name>
USING DELTA
LOCATION '<bucket_name>@<account_id>.r2.cloudflarestorage.com/<folder_name>/';
โ
4. Created share:
create share if not exists <share_name>;โ
5. Tried adding table to share (this fails):
ALTER SHARE <share_name>
ADD TABLE <catalog_name>.<schema_name>.<table_name>;โ
On step 5, I get the following error:
[RequestId=<id> ErrorClass=INVALID_STATE] An error occurred while trying to validate the partition spec of a shared table.
Step 5 works if I run the following after step 3 and use the new table instead, but this duplicates the data in R2, which is what I'm trying to avoid:
CREATE TABLE IF NOT EXISTS <catalog_name>.<schema_name>.<new_table_name> DEEP CLONE <catalog_name>.<schema_name>.<old_table_name>
LOCATION 'r2://<bucket_name>@<account_id>.r2.cloudflarestorage.com/<new_folder_name>/';
Steps 1-5 work with using Amazon S3 external location instead of Cloudflare R2. Is there any way to share existing parquet files in R2 without duplication?