cancel
Showing results for 
Search instead for 
Did you mean: 
Community Platform Discussions
Connect with fellow community members to discuss general topics related to the Databricks platform, industry trends, and best practices. Share experiences, ask questions, and foster collaboration within the community.
cancel
Showing results for 
Search instead for 
Did you mean: 

Problem sharing a streaming table created in Delta Live Table via Delta Sharing

vkuznetsov
New Contributor III

Hi all,

I hope you could help me to figure out what I am missing.

I'm trying to do a simple thing. To read the data from the data ingestion zone (csv files saved to Azure Storage Account) using the Delta Live Tables pipeline and share the resulting table to another Databricks workspace using Delta Sharing.

Here is the code that describes the DLT pipeline.

 

 

import dlt
from pyspark.sql.functions import *
from pyspark.sql.types import *

raw_path = "/mnt/ingestion/sensors-readings"

@dlt.table(
    comment = "Contains data received from sensors API"
)
def sensors_raw():
    # Auto-loader to load newly ingested files only.
    df = spark.readStream.format("cloudFiles") \
            .option("cloudFiles.format", "csv") \
            .option("header",True) \
            .load(raw_path)
    return (df)

 

 

It runs successfully and the table is added to the target schema.

vkuznetsov_0-1689259588838.png

But when I try to create a share the table is not displayed.

 

2023_07_13_16_48_52_Data_Explorer.png

The above tables available for sharing are created from the notebook as df.write.saveAsTable().

When reading available documentation I've seen that the STREAMING_TABLE can't be shared via Delta Sharing.

Maybe I'm missing some settings? It would be great if you could help me to figure it out.

Thanks.

5 REPLIES 5

vkuznetsov
New Contributor III

After reading the documentation carefully I've found that it's mentioned in the current list of DLT and Unity Catalog limitations.

 

shan_chandra
Databricks Employee
Databricks Employee

@vkuznetsov  - As a workaround, can you please try converting the streaming table to a regular table using a stand-alone/periodic job and use it for Delta sharing? 

 

spark.readStream.table("<streaming-table>")
	.writeStream
	.option("checkpointLocation", "dbfs:/checkpoints/checkpoint_bar_1")
	.partitionBy("<partition-column>")
	.toTable("<delta-table>")

 

 

Anonymous
Not applicable

Hi @vkuznetsov 

We appreciate your question being posted in our community! It brings us joy to offer our assistance.

In order to ensure we provide you with the most precise information, kindly take a moment to review the responses and choose the one that best addresses your query.

Doing so will also benefit other community members who might have similar questions later on. We thank you for participating and please don't hesitate to reach out if you require any further help!

vkuznetsov
New Contributor III

Hi @shan_chandra,

In the end, I did it in the way you proposed. Unfortunately, you must do it outside the Delta Live Tables workflow, and you cannot use all its benefits.

It's strange to me, as DLT and Delta Sharing are powerful tools that can't be combined out of the box. Or I'm missing something since I'm new to DLT and Delta Sharing.

jdog
New Contributor II

I'm curious if Databricks plans to address this.  We use delta live streaming tables extensively and also planned on using delta sharing to get our data from our production unity catalog (different region).  Duplicating the data as a workaround is not practical for huge tables.  What is Databricks recommended practice for getting data across to other databricks workspaces across regions?

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group