cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

Delta share existing parquet files in R2

turtleXturtle
New Contributor II

Hi - I have existing parquet files in Cloudflare R2 storage (created outside of Databricks).  I would like to share them via Delta Share, but I keep running into an error.  Is it possible to share existing parquet files without duplicating them?

I did the following steps:

1. Created storage credential and external location pointing to an R2 bucket in Databricks Workspace (AWS)

2. Created catalog: 

 

CREATE CATALOG IF NOT EXISTS <catalog_name>
MANAGED LOCATION 'r2://<bucket_name>@<account_id>.r2.cloudflarestorage.com'
COMMENT 'Location for managed tables and volumes to share using Delta Sharing';

 

3. Created delta table: 

 

CONVERT TO DELTA parquet.`r2://<bucket_name>@<account_id>.r2.cloudflarestorage.com/<folder_name>/`;

CREATE TABLE IF NOT EXISTS <catalog_name>.<schema_name>.<table_name>
USING DELTA
LOCATION '<bucket_name>@<account_id>.r2.cloudflarestorage.com/<folder_name>/';
โ€‹

 

4. Created share: 

 

create share if not exists <share_name>;โ€‹

5. Tried adding table to share (this fails): 

 

 

ALTER SHARE <share_name> 
ADD TABLE <catalog_name>.<schema_name>.<table_name>;โ€‹

 

On step 5, I get the following error: 

 

[RequestId=<id> ErrorClass=INVALID_STATE] An error occurred while trying to validate the partition spec of a shared table.

 

Step 5 works if I run the following after step 3 and use the new table instead, but this duplicates the data in R2, which is what I'm trying to avoid: 

 

CREATE TABLE IF NOT EXISTS <catalog_name>.<schema_name>.<new_table_name> DEEP CLONE <catalog_name>.<schema_name>.<old_table_name>
  LOCATION 'r2://<bucket_name>@<account_id>.r2.cloudflarestorage.com/<new_folder_name>/';

 

Steps 1-5 work with using Amazon S3 external location instead of Cloudflare R2.  Is there any way to share existing parquet files in R2 without duplication?  

1 REPLY 1

turtleXturtle
New Contributor II

Thanks @Retired_mod.  It's currently possible to share a delta table stored in an S3 external location without duplication or doing the `DEEP CLONE` first.  Is it on the roadmap to support this for R2 as well?