cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

Delta share existing parquet files in R2

turtleXturtle
New Contributor II

Hi - I have existing parquet files in Cloudflare R2 storage (created outside of Databricks).  I would like to share them via Delta Share, but I keep running into an error.  Is it possible to share existing parquet files without duplicating them?

I did the following steps:

1. Created storage credential and external location pointing to an R2 bucket in Databricks Workspace (AWS)

2. Created catalog: 

 

CREATE CATALOG IF NOT EXISTS <catalog_name>
MANAGED LOCATION 'r2://<bucket_name>@<account_id>.r2.cloudflarestorage.com'
COMMENT 'Location for managed tables and volumes to share using Delta Sharing';

 

3. Created delta table: 

 

CONVERT TO DELTA parquet.`r2://<bucket_name>@<account_id>.r2.cloudflarestorage.com/<folder_name>/`;

CREATE TABLE IF NOT EXISTS <catalog_name>.<schema_name>.<table_name>
USING DELTA
LOCATION '<bucket_name>@<account_id>.r2.cloudflarestorage.com/<folder_name>/';
โ€‹

 

4. Created share: 

 

create share if not exists <share_name>;โ€‹

5. Tried adding table to share (this fails): 

 

 

ALTER SHARE <share_name> 
ADD TABLE <catalog_name>.<schema_name>.<table_name>;โ€‹

 

On step 5, I get the following error: 

 

[RequestId=<id> ErrorClass=INVALID_STATE] An error occurred while trying to validate the partition spec of a shared table.

 

Step 5 works if I run the following after step 3 and use the new table instead, but this duplicates the data in R2, which is what I'm trying to avoid: 

 

CREATE TABLE IF NOT EXISTS <catalog_name>.<schema_name>.<new_table_name> DEEP CLONE <catalog_name>.<schema_name>.<old_table_name>
  LOCATION 'r2://<bucket_name>@<account_id>.r2.cloudflarestorage.com/<new_folder_name>/';

 

Steps 1-5 work with using Amazon S3 external location instead of Cloudflare R2.  Is there any way to share existing parquet files in R2 without duplication?  

1 REPLY 1

turtleXturtle
New Contributor II

Thanks @Retired_mod.  It's currently possible to share a delta table stored in an S3 external location without duplication or doing the `DEEP CLONE` first.  Is it on the roadmap to support this for R2 as well?

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you wonโ€™t want to miss the chance to attend and share knowledge.

If there isnโ€™t a group near you, start one and help create a community that brings people together.

Request a New Group