Databricks Community

vkuznetsov · ‎07-13-2023

Hi all,

I hope you could help me to figure out what I am missing.

I'm trying to do a simple thing. To read the data from the data ingestion zone (csv files saved to Azure Storage Account) using the Delta Live Tables pipeline and share the resulting table to another Databricks workspace using Delta Sharing.

Here is the code that describes the DLT pipeline.

import dlt
from pyspark.sql.functions import *
from pyspark.sql.types import *

raw_path = "/mnt/ingestion/sensors-readings"

@dlt.table(
    comment = "Contains data received from sensors API"
)
def sensors_raw():
    # Auto-loader to load newly ingested files only.
    df = spark.readStream.format("cloudFiles") \
            .option("cloudFiles.format", "csv") \
            .option("header",True) \
            .load(raw_path)
    return (df)

It runs successfully and the table is added to the target schema.

But when I try to create a share the table is not displayed.

The above tables available for sharing are created from the notebook as df.write.saveAsTable().

When reading available documentation I've seen that the STREAMING_TABLE can't be shared via Delta Sharing.

Maybe I'm missing some settings? It would be great if you could help me to figure it out.

Thanks.

vkuznetsov · ‎07-17-2023

After reading the documentation carefully I've found that it's mentioned in the current list of DLT and Unity Catalog limitations.

shan_chandra · ‎07-18-2023

@vkuznetsov - As a workaround, can you please try converting the streaming table to a regular table using a stand-alone/periodic job and use it for Delta sharing?

spark.readStream.table("<streaming-table>")
	.writeStream
	.option("checkpointLocation", "dbfs:/checkpoints/checkpoint_bar_1")
	.partitionBy("<partition-column>")
	.toTable("<delta-table>")

Anonymous · ‎07-20-2023

Hi @vkuznetsov

We appreciate your question being posted in our community! It brings us joy to offer our assistance.

In order to ensure we provide you with the most precise information, kindly take a moment to review the responses and choose the one that best addresses your query.

Doing so will also benefit other community members who might have similar questions later on. We thank you for participating and please don't hesitate to reach out if you require any further help!

vkuznetsov · ‎07-20-2023

Hi @shan_chandra,

In the end, I did it in the way you proposed. Unfortunately, you must do it outside the Delta Live Tables workflow, and you cannot use all its benefits.

It's strange to me, as DLT and Delta Sharing are powerful tools that can't be combined out of the box. Or I'm missing something since I'm new to DLT and Delta Sharing.

jdog · ‎03-08-2024

I'm curious if Databricks plans to address this. We use delta live streaming tables extensively and also planned on using delta sharing to get our data from our production unity catalog (different region). Duplicating the data as a workaround is not practical for huge tables. What is Databricks recommended practice for getting data across to other databricks workspaces across regions?

Databricks Community

Problem sharing a streaming table created in Delta Live Table via Delta Sharing

🔔 ALERT: Act Now to Protect Your Community Account; Secure Your Details Before It's Too Late!

Databricks Learning Festival (Virtual): 10 July - 24 July 2024

Data + AI Summit 2024: An Executive Summary for Data Leaders

Big Data Is Back and Is More Important Than AI

Announcing Mosaic AI Agent Framework and Agent Evaluation