cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Delta share existing parquet files in R2

turtleXturtle
New Contributor II

Hi - I have existing parquet files in Cloudflare R2 storage (created outside of Databricks).  I would like to share them via Delta Share, but I keep running into an error.  Is it possible to share existing parquet files without duplicating them?

I did the following steps:

1. Created storage credential and external location pointing to an R2 bucket in Databricks Workspace (AWS)

2. Created catalog: 

 

CREATE CATALOG IF NOT EXISTS <catalog_name>
MANAGED LOCATION 'r2://<bucket_name>@<account_id>.r2.cloudflarestorage.com'
COMMENT 'Location for managed tables and volumes to share using Delta Sharing';

 

3. Created delta table: 

 

CONVERT TO DELTA parquet.`r2://<bucket_name>@<account_id>.r2.cloudflarestorage.com/<folder_name>/`;

CREATE TABLE IF NOT EXISTS <catalog_name>.<schema_name>.<table_name>
USING DELTA
LOCATION '<bucket_name>@<account_id>.r2.cloudflarestorage.com/<folder_name>/';
​

 

4. Created share: 

 

create share if not exists <share_name>;​

5. Tried adding table to share (this fails): 

 

 

ALTER SHARE <share_name> 
ADD TABLE <catalog_name>.<schema_name>.<table_name>;​

 

On step 5, I get the following error: 

 

[RequestId=<id> ErrorClass=INVALID_STATE] An error occurred while trying to validate the partition spec of a shared table.

 

Step 5 works if I run the following after step 3 and use the new table instead, but this duplicates the data in R2, which is what I'm trying to avoid: 

 

CREATE TABLE IF NOT EXISTS <catalog_name>.<schema_name>.<new_table_name> DEEP CLONE <catalog_name>.<schema_name>.<old_table_name>
  LOCATION 'r2://<bucket_name>@<account_id>.r2.cloudflarestorage.com/<new_folder_name>/';

 

Steps 1-5 work with using Amazon S3 external location instead of Cloudflare R2.  Is there any way to share existing parquet files in R2 without duplication?  

3 REPLIES 3

Kaniz_Fatma
Community Manager
Community Manager

Hi @turtleXturtleWhile you noted that using DEEP CLONE creates a duplicate, this is currently the method that ensures the table is properly formatted for Delta Sharing. If avoiding duplication is critical, consider whether you can manage the data lifecycle to minimize the impact of this duplication.

turtleXturtle
New Contributor II

Thanks @Kaniz_Fatma.  It's currently possible to share a delta table stored in an S3 external location without duplication or doing the `DEEP CLONE` first.  Is it on the roadmap to support this for R2 as well?

Hi @turtleXturtle, While Delta Sharing allows sharing Delta tables in general, sharing tables stored in R2 external locations does not appear to be a priority on the current Delta Lake roadmap. The roadmap is focused more on improving core Delta Lake features and integrations.

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group